In MongoDB, one should use projections wherever possible. I wanted to quantify two things
db.events.stats()
{
"ns" : "6ed11a9246404d1b95fe.events",
"count" : 16912712,
"size" : 8617342080,
"avgObjSize" : 509,
"numExtents" : 25,
"storageSize" : 9305935856,
"lastExtentSize" : 2146426864,
"paddingFactor" : 1,
"userFlags" : 1,
"capped" : false,
"nindexes" : 4,
"totalIndexSize" : 1167933424,
"indexSizes" : {
"_id_" : 548732240,
"notificationId_1" : 14545104,
"parameters.notificationId_1" : 54296816,
"eventName_1" : 550359264
},
"ok" : 1
}
So, there are around 17M entries, having a total size of 8.6GB, thus having an average object size of 8.6GB/17M = 509 bytes.
I fetched first three million entries in this collection, using a python program like this:
class QGMongo(object):
__conn = None
@classmethod
def get_connection(cls):
if cls.__conn is None:
cls.__conn = MongoClient('127.0.0.1', 27000)
return cls.__conn
if __name__ == '__main__':
conn = QGMongo.get_connection()
database = conn['6ed11a9246404d1b95fe']
events = database.events.find()
count = 0
for event in events:
count += 1
if count == 3000000:
break
This prgram took around 120s to run.
I set db.setProfilingLevel(2) and then figured out from system.profiles collection that
As per db.system.profile collection,
class QGMongo(object):
__conn = None
@classmethod
def get_connection(cls):
if cls.__conn is None:
cls.__conn = MongoClient('127.0.0.1', 27000)
return cls.__conn
if __name__ == '__main__':
conn = QGMongo.get_connection()
database = conn['6ed11a9246404d1b95fe']
events = database.events.find({}, {'eventName': 1})
count = 0
for event in events:
count += 1
if count == 3000000:
break
This time, the program took around 30s.
As per db.system.profile collection,
As a result of this, I have the answer to my first question: the query times improve significantly using projections. When I used projection such that the data size required by my program dropped to around 15% (1.1GB to 160MB), program running time dropped to 25% (120s to 30s).
However, my second question is not well answered. There are two outstanding questions: