MongoDB - slow '$group' performance -
I have a MongoDB archive of more than 1,000,000 records. The size of each record is around 20 (hence the total storage size is about 20 GB).
I have a 'Type' field in the collection (which can be approximately 10 different values) I would like to get a counter-type counter for the archive; In addition, there is an index on the 'Type' field.
I have tested two different methods (consider the Python syntax):
A naive method - Each value to use the 'count' call:
For type_val in my_db.my_colc.distinct ('type'): Counter [type_val] = my_db.my_colc.find ({'type': type_val}). ()
Use the aggregation structure with a '$ group' syntax:
counter = my_db.my_colc.aggregate ([{'$ group' : {'_ID': '$ type', 'AGG_val': {'$ sum': 1}}}])
The performance I am receiving for the first approach is Approximately 2 orders, more intensity than the second approach is likely to be related to the fact that the counting only runs on the index, without access to documents, while $ group on documents One has to go up one-by-one (this is about 1min versus 45 amin.)
Is there a way to run an efficient group query on the 'index', which only uses the index Will, thus achieving performance results from # 1, but using the aggregation framework?
I am using MongodiBi 2.6.1
Update: On this issue Mongodebi is open in Jira.
In the aggregation pipeline, $ group does not use section index, it is used after match $ , Which can actually use indexed to speed it up.
Cheers,
Comments
Post a Comment