In MongoDB, one should use projections wherever possible. I wanted to quantify two things
db.events.stats() { "ns" : "6ed11a9246404d1b95fe.events", "count" : 16912712, "size" : 8617342080, "avgObjSize" : 509, "numExtents" : 25, "storageSize" : 9305935856, "lastExtentSize" : 2146426864, "paddingFactor" : 1, "userFlags" : 1, "capped" : false, "nindexes" : 4, "totalIndexSize" : 1167933424, "indexSizes" : { "_id_" : 548732240, "notificationId_1" : 14545104, "parameters.notificationId_1" : 54296816, "eventName_1" : 550359264 }, "ok" : 1 }
So, there are around 17M entries, having a total size of 8.6GB, thus having an average object size of 8.6GB/17M = 509 bytes.
I fetched first three million entries in this collection, using a python program like this:
class QGMongo(object): __conn = None @classmethod def get_connection(cls): if cls.__conn is None: cls.__conn = MongoClient('127.0.0.1', 27000) return cls.__conn if __name__ == '__main__': conn = QGMongo.get_connection() database = conn['6ed11a9246404d1b95fe'] events = database.events.find() count = 0 for event in events: count += 1 if count == 3000000: break
This prgram took around 120s to run.
I set db.setProfilingLevel(2) and then figured out from system.profiles collection that
As per db.system.profile collection,
class QGMongo(object): __conn = None @classmethod def get_connection(cls): if cls.__conn is None: cls.__conn = MongoClient('127.0.0.1', 27000) return cls.__conn if __name__ == '__main__': conn = QGMongo.get_connection() database = conn['6ed11a9246404d1b95fe'] events = database.events.find({}, {'eventName': 1}) count = 0 for event in events: count += 1 if count == 3000000: breakThis time, the program took around 30s.
As per db.system.profile collection,
As a result of this, I have the answer to my first question: the query times improve significantly using projections. When I used projection such that the data size required by my program dropped to around 15% (1.1GB to 160MB), program running time dropped to 25% (120s to 30s).
However, my second question is not well answered. There are two outstanding questions: