How would I filter and append to rest of the reduce results?

On Sat, Apr 6, 2013 at 4:09 PM, Christian Dahlqvist <christ...@basho.com>
wrote:

> Hi Kartik,
> The reduce phase will normally run recursively a number of times as results 
> come in from map phases on different nodes [1]. This allows Riak to start 
> reducing the results before all data is available and will not require all 
> input data to be stored in memory on the coordinating node. The output from 
> the first iteration will be passed into the following iteration together with 
> the next batch of map phase results. When processing large amounts of data 
> this is the most efficient way to do it, and reduce functions should be 
> written in a way so that they can handle this.
> If you know that the number of results will be manageable or the reduce phase 
> requires all results to be processed at once, there is an argument called 
> 'reduce_phase_only_1' [1] that can be specified in order to force the reduce 
> phase to run only once all map phase results are available.
> In your case you might however be able to completely eliminate the reduce 
> phase if you made the map phase perform filtering and formatting. If a record 
> does not match the criteria, you just return [].
> This would give you a list of JSON documents instead of a single one, but 
> would most likely be more efficient.
> Best regards,
> Christian
> [1] 
> http://docs.basho.com/riak/1.2.0/references/appendices/MapReduce-Implementation/
> On 6 Apr 2013, at 20:22, Kartik Thakore <kthak...@aimed.cc> wrote:
>> I am not sure have you mean by re-reduce. Can you clarify what you mean? 
>> 
>> I was thinking the reduce phase happens after the whole map is done. Do I 
>> have to implement a count waiter for the map results in the reduce code?
>> 
>> 
>> On Sat, Apr 6, 2013 at 3:19 PM, Christian Dahlqvist <christ...@basho.com> 
>> wrote:
>> Hi Kartik,
>> 
>> What you are seeing is a result of you not accounting for re-reduce in you 
>> reduce phase function. 
>> 
>> In Riak reduce phases generally run recursively and the input for each run 
>> may contain both values from preceding map phase as well as output from 
>> previous iterations of the reduce phase. In order for the reduce phase to 
>> behave correctly you will need to distinguish between the different types of 
>> input records in your reduce function. 
>> 
>> Best regards,
>> 
>> Christian
>> 
>> 
>> 
>> 
>> On 6 Apr 2013, at 19:09, Kartik Thakore <kthak...@aimed.cc> wrote:
>> 
>>> Hello,
>>> 
>>> I recently setup a test cluster to try to do a tech demo web application on.
>>> 
>>> I have been having some weirdness with the map reduce functionality.
>>> 
>>> My database is here:
>>> 
>>> http://aimed.cc:8098/riak/rekon/go#/buckets/test_rand_docs
>>> 
>>> The cluster has 5 nodes
>>> ulimit 4096
>>> 
>>> This is Riak 1.3.0 release on Debian with 663 of free memory.
>>> 
>>> I am running this map reduce:
>>> 
>>> curl -X POST -H "content-type: application/json" \
>>>     http://aimed.cc:8098/mapred --data @-<<\EOF
>>> {"inputs": "test_rand_docs",
>>> "query":[{"map":{"language":"javascript","source":"
>>>     function (v) {
>>>         var r = {};
>>>         var data = JSON.parse(v.values[0].data);
>>>         r.data = data;
>>>         r.key = v.key;
>>>         return [ r ];
>>>     }
>>> "}},{"reduce":{"language":"javascript","source":"
>>>     function (v) {
>>>         var r = {};
>>>         for( var i in v )
>>>         {
>>>             var doc = v[i];
>>>             if( doc['data'] !== undefined) {
>>>                 var age = doc['data']['age_int'];
>>>                 if ( age !== undefined && age > 10 && age <25 ){
>>>                     r[doc['key']] = doc['data'];
>>>                 }
>>>             }
>>>         }
>>>         return  [ r ];
>>> 
>>>     }
>>> "}}]
>>> 
>>> 
>>> my result is randomly:
>>> 
>>> [{"9DYMGV0B6Jdn5DivoTExiqyDYUC":{"age_int":24},"JQYUs2onC822EOzMaToz71j77e":{"age_int":18},"AcrUwotAdYaV5zitaMylnUgYsWY":{"age_int":24}}]
>>> 
>>> 
>>> or
>>> 
>>> [{"LYJpg97ZA5qjZTTv2cfavmRgxLb":{"age_int":11}}]
>>> 
>>> but it is clear with this:
>>> 
>>> http://aimed.cc:8098/solr/test_rand_docs/select?q=age_int:[10%20TO%2025]
>>> 
>>> 
>>> that there are 134 records ....
>>> 
>>> so what is going on?
>>> 
>>> 
>>> Is it low memory? Or that it is on a XEN machine (Linode)? Is there a 
>>> scaleable memory server vendor (AWS or w/e) I should consider?
>>> 
>>> Thanks,
>>> Kartik Thakore
>>> 
>>> 
>>> _______________________________________________
>>> riak-users mailing list
>>> riak-users@lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> 
>> 
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to