No F*cking Idea

Common answer to everything

Using Map Reduce With Mongodb

| Comments

Mongodb has support for running Map Reduce queries besides having regular sql like query interface. In documentation we can read that it is not the best idea to use it as a regular interface but it is very good for generating things in backgrounds like preparing reports or caching some data. I will try to show simple example how to create a useful map reduce query and execute it.

Javascript

Map reduce queries in Mongodb are written in javascript. All you have to do is to prepare two regular javascript functions

1
2
3
4
var map = function(){
    /* emit values for each document */
  emit();
}

In map function you have to emit key –> values from document, eg. for each document emit urls and counts of them.

1
2
3
4
var reduce = function(key, value){
    /* reduce emited values into result */
  return {result1: one, result2: two};
}

In reduce function you simple gather results and sum them up. It is easier to think about if you will imagine that reduce is something like fold or inject (depending on background) on emitted values from mapping function.

Running scripts

Mongo db has a really nice interface for running scripts. Lets examine a simple example

1
mongo localhost:27017/canis_production generate_report.js

This will run generate_report.js script on database canis_production on db node localhost:27017. You don’t need to do it, but its easiet to write it into file then type each time functions ;).

Example Map reduce query

Now this is a simple mapReduce that actually do something. It is emitting for each document url field and value 1. Reducer is adding values for the same key so this way we will know how many occurrences of each url we have across whole collection.

1
2
3
4
5
6
7
8
var map = function(){
  emit(this.url, 1);
}
var reduce = function(key, values){
  var res = 0;
  values.forEach(function(v){ res += 1});
  return {count: res};
}

this is have we defined out map reduce functions now all we need to do is just runt he query.

1
db.sites.mapReduce(map, reduce, { out: "mapped_urls" });

To run mapReduce we are using mapReduce function on collection (this example uses collection named “sites”), first argument is map function, second is reduce function and third is option but very useful, it is output collection where results will be stored in form of documents. This option lets us run the query at eg. night and see results in the morning :).

Lets test it

First some sample data

1
2
3
4
5
> db.sites.insert({url: "www.google.com", date: new Date(), trash_data: 5 });
> db.sites.insert({url: "www.no-fucking-idea.com", date: new Date(), trash_data: 13 });
> db.sites.insert({url: "www.google.com", date: new Date(), trash_data: 1 });
> db.sites.insert({url: "www.no-fucking-idea.com", date: new Date(), trash_data: 69 });
> db.sites.insert({url: "www.no-fucking-idea.com", date: new Date(), trash_data: 256 });

now functions and query

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
> var map = function(){
...   emit(this.url, 1);
... }
> var reduce = function(key, values){
...   var res = 0;
...   values.forEach(function(v){ res += 1});
...   return {count: res};
... }
> db.sites.mapReduce(map, reduce, { out: "mapped_urls" });
{
  "result" : "mapped_urls",
  "timeMillis" : 75,
  "counts" : {
      "input" : 5,
      "emit" : 5,
      "reduce" : 2,
      "output" : 2
  },
  "ok" : 1,
}

And results

1
2
3
> db.mapped_urls.find({})
{ "_id" : "www.google.com", "value" : { "count" : 2 } }
{ "_id" : "www.no-fucking-idea.com", "value" : { "count" : 3 } }

Worked perfect ;)

Docs

More information on map reduce interface you can find in documentation for mongodb http://www.mongodb.org/display/DOCS/MapReduce

Comments