In current project need to calc data stored in MongoDB with MapReduce algo in php Laravel (REST services). Ideas how to will publish here.

  • map-reduce operation
  • mapReduce prototype form
  • table SQL to noSQl MongDB MapReduce
  • Example1: Calculate RatingSnapshot and Total Quantity with Average Quantity Per Rating
  • Example2: Calc num_ratings
  • Example3: Places in each network
  • Example4: Pivot Data with Map reduce
  • php - MongoDB::command (Execute a database command) Examples

Diagram of the annotated map-reduce operation

In this map-reduce operation, MongoDB applies the map phase to each input document (i.e. the documents in the collection that match the query condition). The map function emits key-value pairs. The mapReduce command has the following prototype form:

db.runCommand(
               {
                 mapReduce: <collection>,
                 map: <function>,
                 reduce: <function>,
                 out: <output>,
                 query: <document>,
                 sort: <document>,
                 limit: <number>,
                 finalize: <function>,
                 scope: <document>,
                 jsMode: <boolean>,
                 verbose: <boolean>
               }
             )
 
//usage of the mapReduce command
var mapFunction = f unction ( ) { ... };
var reduceFunction = f unction ( key, values ) { ... };
 
db.runCommand(
               {
                 mapReduce: 'orders',
                 map: mapFunction,
                 reduce: reduceFunction,
                 out: { merge: 'map_reduce_results', db: 'test' },
                 query: { ord_date: { $gt: new Date('01/01/2012') } }
               }
             )

The customer wants to have collection:

RatingSnapshot: [{
  location_id: 12345abcde,
  network: "google",
  timestamp: 2014-09-23T11:34:00.123Z,
  rating: 2.1,
  num_ratings: 6
},
{
  location_id: 12345abcde,
  network: "yelp",
  timestamp: 2014-09-23T11:34:00.123Z,
  rating: 2.2,
  num_ratings: 7
}
...
];

My proposal:

RatingSnapshot: 
{ 
 location_id: 12345abcde,
 timestamp: 2014-09-23T11:34:00.123Z,
 advertizer: "McDonald's"
ratings:
[
{
   network: "google",
   rating: 2.1,
   num_ratings: 6
},
{
  network: "yelp",
  rating: 2.2,
  num_ratings: 7
}
...
];

One more pix describing SQL to noSQl MongDB commands Mysql to MongoDB

Basic Example1: Calculate RatingSnapshot and Total Quantity with Average Quantity Per Rating

Map-reduce operation on the RatingSnapshots collection for all documents that have a Timestamp value greater than 01/10/2014. The operation groups by the Rating.Network field, and calculates the number of RatingSnapshot and the total quantity RatingSnapshoted for each Network. The operation concludes by calculating the average quantity perRatingSnapshot for each Network value. Define the map function to process each input document:

  • In the function, this refers to the document that the map-reduce operation is processing.
  • For each Rating, the function associates the Network with a new object value that contains the count of 1 and the Rating Num_ratings for theRatingSnapshot and emits the Network and value pair.
var mapFunction = f unction ( ) {
                       for (var idx = 0; idx < this.ratings.length; idx++) {
                           var key = this.ratings[idx].network;
                           var value = {
                                         count: 1,
                                         num_ratings: this.ratings[idx].num_ratings
                                       };
                           emit(key, value);
                       }
                    };

Define the corresponding reduce function with two arguments keyNetwork and countObjVals:

  • countObjVals is an array whose elements are the objects mapped to the grouped keyNetwork values passed by map function to the reducer function.
  • The function reduces the countObjVals array to a single object reducedValue that contains the count and the num_ratings fields.
  • In reducedVal, the count field contains the sum of the count fields from the individual array elements, and the num_ratings field contains the sum of the num_ratings fields from the individual array elements.
var reduceFunction = f unction ( keyNetwork, countObjVals ) {
                     reducedVal = { count: 0, num_ratings: 0 };
 
                     for (var idx = 0; idx < countObjVals.length; idx++) {
                         reducedVal.count += countObjVals[idx].count;
                         reducedVal.num_ratings += countObjVals[idx].num_ratings;
                     }
 
                     return reducedVal;
                  };

Then define a finalize function with two arguments key and reducedVal. The function modifies the reducedVal object to add a computed field named avg and returns the modified object

var finalizeFunction = f unction ( key, reducedVal ) {
 
                       reducedVal.avg = reducedVal.num_ratings/reducedVal.count;
 
                       return reducedVal;
 
                    };

Perform the map-reduce operation on the RatingSnapshots collection using the mapFunction,reduceFunction, and finalizeFunction functions.

db.RatingSnapshots.mapReduce( mapFunction,
                     reduceFunction,
                     {
                       out: { merge: "map_reduce_output_num_rating" },
                       query: { ord_date:
                                  { $gt: new Date('01/10/2014') }
                              },
                       finalize: finalizeFunction
                     }
                   )

This operation uses the query field to select only those documents with ord_date greater than new Date(01/10/2014). Then it output the results to a collection map_reduce_output_num_rating. If the map_reduce_output_num_rating collection already exists, the operation will merge the existing contents with the results of this map-reduce operation.

Basic Example2: Calc num_ratings

{
    network: 'google',
   rating: 3.4
    num_ratings: 2
}

Here we have a num_rating networked by ‘google’ with two num_ratings. Now, we want to find the total number of num_ratings each num_rating network has earned across the entire num_rating collection. It’s a problem easily solved with map-reduce. Mapping As its name suggests, map-reduce essentially involves two operations. The first, specified by our map function, formats our data as a series of key-value pairs. Our key is the num_rating network’s name (this makes sense only if this username is unique). Our value is a document containing the number of num_ratings. We generate these key-value pairs by emitting them. See below:

// Our key is network's userame; 
// our value, the number of num_ratings for the current num_rating.
var map = f unction ( ) {
    emit(this.network, {num_ratings: this.num_ratings});
};

When we run map-reduce, the map function is applied to each document. This results in a collection of key-value pairs. What do we do with these results? It turns out that we don’t even have to think about them because they’re automatically passed on to our reduce function. Reducing Specifically, the reduce function will be invoked with two arguments: a key and an array of values associated with that key. Returning to our example, we can imagine our reduce function receiving something like this:

reduce('google', [{num_ratings: 2}, {num_ratings: 1}, {num_ratings: 4}]);
 
//easy to come up with a reduce function for tallying num_ratings:
// Add up all the num_ratings for each key.
var reduce = function(key, values) {
    var sum = 0;
    values.forEach(function(doc) {
        sum += doc.num_ratings;
    });
    return {num_ratings: sum};
};

Results From the shell, we pass our map and reduce functions to the mapReduce helper.

// Running mapReduce.
var op = db.num_ratings.mapReduce(map, reduce, {out: "mr_results"});
{
    "result" : "mr_results",
    "timeMillis" : 8,
    "counts" : {
        "input" : 6,
        "emit" : 6,
        "output" : 2
    },
    "ok" : 1
}
 
// Getting the results from the shell
db[op.result].find();
{ "_id" : "yelp", "value" : { "num_ratings" : 21 } }
{ "_id" : "google", "value" : { "num_ratings" : 13 } }

Basic Example3: Places in each network

We want to end up with a "networks" collection that has documents that look like this: {"_id" : "Google", "value" : 4} {"_id" : "Yelp", "value" : 2}

Emit each network in the map function, then count them in the reduce function. 1 The map function first checks if there is a networks field, as running a for-loop on undef would cause an error. Once that has been established, we go through each element, emiting the network name and a count of 1:

map = f unction ( ) {
    if (!this.rating.network) {
        return;
    }
 
    for (index in this.rating.network) {
        emit(this.rating.network[index], 1);
    }
}

2 Reduce. For the reduce function, we initialize a counter to 0 and then add each element of the current array to it. Then we return the final count.

reduce = function(previous, current) {
    var count = 0;
 
    for (index in current) {
        count += current[index];
    }
 
    return count;
}

3 Call the mapreduce command

result = db.r u n Command({
... "mapreduce" : "RatingSnapsohts",
... "map" : map,
... "reduce" : reduce,
... "out" : "networks"})
 
db.networks.find ( )
{"_id" : "Google", "value" : 4}
{"_id" : "Yelp", "value" : 2}

Basic Example4: Pivot Data with Map reduce

You have a collection of Places with an array of the rating.network with data. You want to generate a collection of rating.network with an array of Places in each.

db.Places.insert( { Place: "123asd", rating.network: ['Google', 'Yelp', '4square'] });
 db.Places.insert( { Place: "adf134", rating.network: ['Google', 'Yelp', 'fb'] });

We need to loop through each location in the Place document and emit each location individually. The catch here is in the reduce phase. We cannot emit an array from the reduce phase, so we must build a Places array inside of the "value" document that is returned.

map = f unction ( ) {
  for(var i in this.rating.network){
    key = { location: this.rating.network[i] };
    value = { Places: [ this.Place ] };
    emit(key, value);
  }
}
 
reduce = f unction ( key, values) {
  Place_list = { Places: [] };
  for(var i in values) {
    Place_list.Places = values[i].Places.con cat(Place_list.Places);
  }
  return Place_list;
}

php - MongoDB::command (Execute a database command) Examples

//Finding all of the distinct values for a key.
$people = $db->people;
 
$people->insert(array("name" => "Joe", "age" => 4));
$people->insert(array("name" => "Sally", "age" => 22));
$people->insert(array("name" => "Dave", "age" => 22));
$people->insert(array("name" => "Molly", "age" => 87));
 
$ages = $db->command(array("distinct" => "people", "key" => "age"));
 
foreach ($ages['values'] as $age) {
    echo "$age\n";
}
 
//Finding all of the distinct values for a key, where the value is larger than or equal to 18.
$people = $db->people;
 
$people->insert(array("name" => "Joe", "age" => 4));
$people->insert(array("name" => "Sally", "age" => 22));
$people->insert(array("name" => "Dave", "age" => 22));
$people->insert(array("name" => "Molly", "age" => 87));
 
$ages = $db->command(
    array(
        "distinct" => "people",
        "key" => "age", 
        "query" => array("age" => array('$gte' => 18))
    )
);  
 
foreach ($ages['values'] as $age) {
    echo "$age\n";
}
 
//Get all users with at least on "sale" event, and how many times each of these users has had a sale.
 
// sample event document
$events->insert(array("user_id" => $id, 
    "type" => $type, 
    "time" => new MongoDate(), 
    "desc" => $description));
 
// construct map and reduce functions
$map = new MongoCode("function ( ) { emit(this.user_id,1); }");
$reduce = new MongoCode("function ( k, vals ) { ".
    "var sum = 0;".
    "for (var i in vals) {".
        "sum += vals[i];". 
    "}".
    "return sum; }");
 
$sales = $db->command(array(
    "mapreduce" => "events", 
    "map" => $map,
    "reduce" => $reduce,
    "query" => array("type" => "sale"),
    "out" => array("merge" => "eventCounts")));
 
$users = $db->selectCollection($sales['result'])->find();
 
foreach ($users as $user) {
    echo "{$user['_id']} had {$user['value']} sale(s).\n";
}

Leave a Comment

Fields with * are required.

Please enter the letters as they are shown in the image above.
Letters are not case-sensitive.