Hi Kartik,

The reduce phase will normally run recursively a number of times as results 
come in from map phases on different nodes [1]. This allows Riak to start 
reducing the results before all data is available and will not require all 
input data to be stored in memory on the coordinating node. The output from the 
first iteration will be passed into the following iteration together with the 
next batch of map phase results. When processing large amounts of data this is 
the most efficient way to do it, and reduce functions should be written in a 
way so that they can handle this.

If you know that the number of results will be manageable or the reduce phase 
requires all results to be processed at once, there is an argument called 
'reduce_phase_only_1' [1] that can be specified in order to force the reduce 
phase to run only once all map phase results are available.

In your case you might however be able to completely eliminate the reduce phase 
if you made the map phase perform filtering and formatting. If a record does 
not match the criteria, you just return [].

This would give you a list of JSON documents instead of a single one, but would 
most likely be more efficient.

Best regards,

Christian

[1] 
http://docs.basho.com/riak/1.2.0/references/appendices/MapReduce-Implementation/


On 6 Apr 2013, at 20:22, Kartik Thakore <kthak...@aimed.cc> wrote:

> I am not sure have you mean by re-reduce. Can you clarify what you mean? 
> 
> I was thinking the reduce phase happens after the whole map is done. Do I 
> have to implement a count waiter for the map results in the reduce code?
> 
> 
> On Sat, Apr 6, 2013 at 3:19 PM, Christian Dahlqvist <christ...@basho.com> 
> wrote:
> Hi Kartik,
> 
> What you are seeing is a result of you not accounting for re-reduce in you 
> reduce phase function. 
> 
> In Riak reduce phases generally run recursively and the input for each run 
> may contain both values from preceding map phase as well as output from 
> previous iterations of the reduce phase. In order for the reduce phase to 
> behave correctly you will need to distinguish between the different types of 
> input records in your reduce function. 
> 
> Best regards,
> 
> Christian
> 
> 
> 
> 
> On 6 Apr 2013, at 19:09, Kartik Thakore <kthak...@aimed.cc> wrote:
> 
>> Hello,
>> 
>> I recently setup a test cluster to try to do a tech demo web application on.
>> 
>> I have been having some weirdness with the map reduce functionality.
>> 
>> My database is here:
>> 
>> http://aimed.cc:8098/riak/rekon/go#/buckets/test_rand_docs
>> 
>> The cluster has 5 nodes
>> ulimit 4096
>> 
>> This is Riak 1.3.0 release on Debian with 663 of free memory.
>> 
>> I am running this map reduce:
>> 
>> curl -X POST -H "content-type: application/json" \
>>     http://aimed.cc:8098/mapred --data @-<<\EOF
>> {"inputs": "test_rand_docs",
>> "query":[{"map":{"language":"javascript","source":"
>>     function (v) {
>>         var r = {};
>>         var data = JSON.parse(v.values[0].data);
>>         r.data = data;
>>         r.key = v.key;
>>         return [ r ];
>>     }
>> "}},{"reduce":{"language":"javascript","source":"
>>     function (v) {
>>         var r = {};
>>         for( var i in v )
>>         {
>>             var doc = v[i];
>>             if( doc['data'] !== undefined) {
>>                 var age = doc['data']['age_int'];
>>                 if ( age !== undefined && age > 10 && age <25 ){
>>                     r[doc['key']] = doc['data'];
>>                 }
>>             }
>>         }
>>         return  [ r ];
>> 
>>     }
>> "}}]
>> 
>> 
>> my result is randomly:
>> 
>> [{"9DYMGV0B6Jdn5DivoTExiqyDYUC":{"age_int":24},"JQYUs2onC822EOzMaToz71j77e":{"age_int":18},"AcrUwotAdYaV5zitaMylnUgYsWY":{"age_int":24}}]
>> 
>> 
>> or
>> 
>> [{"LYJpg97ZA5qjZTTv2cfavmRgxLb":{"age_int":11}}]
>> 
>> but it is clear with this:
>> 
>> http://aimed.cc:8098/solr/test_rand_docs/select?q=age_int:[10%20TO%2025]
>> 
>> 
>> that there are 134 records ....
>> 
>> so what is going on?
>> 
>> 
>> Is it low memory? Or that it is on a XEN machine (Linode)? Is there a 
>> scaleable memory server vendor (AWS or w/e) I should consider?
>> 
>> Thanks,
>> Kartik Thakore
>> 
>> 
>> _______________________________________________
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
> 

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to