MongoDB - slow '$group' performance -

- April 15, 2010

I have a MongoDB archive of more than 1,000,000 records. The size of each record is around 20 (hence the total storage size is about 20 GB).

I have a 'Type' field in the collection (which can be approximately 10 different values) I would like to get a counter-type counter for the archive; In addition, there is an index on the 'Type' field.

I have tested two different methods (consider the Python syntax):

A naive method - Each value to use the 'count' call:

For type_val in

  my_db.my_colc.distinct ('type'): Counter [type_val] = my_db.my_colc.find ({'type': type_val}). ()

Use the aggregation structure with a '$ group' syntax:

  counter = my_db.my_colc.aggregate ([{'$ group' : {'_ID': '$ type', 'AGG_val': {'$ sum': 1}}}])

The performance I am receiving for the first approach is Approximately 2 orders, more intensity than the second approach is likely to be related to the fact that the counting only runs on the index, without access to documents, while $ group on documents One has to go up one-by-one (this is about 1min versus 45 amin.)

Is there a way to run an efficient group query on the 'index', which only uses the index Will, thus achieving performance results from # 1, but using the aggregation framework?

I am using MongodiBi 2.6.1

Update: On this issue Mongodebi is open in Jira.

In the aggregation pipeline, $ group does not use section index, it is used after match $ , Which can actually use indexed to speed it up.

Cheers,

Search This Blog

Sign

MongoDB - slow '$group' performance -

A naive method - Each value to use the 'count' call:

Use the aggregation structure with a '$ group' syntax:

Comments

Post a Comment

Popular posts from this blog

java - org.apache.http.ProtocolException: Target host is not specified -

How to access user directory in lazarus? -

c# - Chart control: Design messed Up after clearing and re-adding Y-Values -