How would I filter and append to rest of the reduce results? On Sat, Apr 6, 2013 at 4:09 PM, Christian Dahlqvist <christ...@basho.com> wrote:
> Hi Kartik, > The reduce phase will normally run recursively a number of times as results > come in from map phases on different nodes [1]. This allows Riak to start > reducing the results before all data is available and will not require all > input data to be stored in memory on the coordinating node. The output from > the first iteration will be passed into the following iteration together with > the next batch of map phase results. When processing large amounts of data > this is the most efficient way to do it, and reduce functions should be > written in a way so that they can handle this. > If you know that the number of results will be manageable or the reduce phase > requires all results to be processed at once, there is an argument called > 'reduce_phase_only_1' [1] that can be specified in order to force the reduce > phase to run only once all map phase results are available. > In your case you might however be able to completely eliminate the reduce > phase if you made the map phase perform filtering and formatting. If a record > does not match the criteria, you just return []. > This would give you a list of JSON documents instead of a single one, but > would most likely be more efficient. > Best regards, > Christian > [1] > http://docs.basho.com/riak/1.2.0/references/appendices/MapReduce-Implementation/ > On 6 Apr 2013, at 20:22, Kartik Thakore <kthak...@aimed.cc> wrote: >> I am not sure have you mean by re-reduce. Can you clarify what you mean? >> >> I was thinking the reduce phase happens after the whole map is done. Do I >> have to implement a count waiter for the map results in the reduce code? >> >> >> On Sat, Apr 6, 2013 at 3:19 PM, Christian Dahlqvist <christ...@basho.com> >> wrote: >> Hi Kartik, >> >> What you are seeing is a result of you not accounting for re-reduce in you >> reduce phase function. >> >> In Riak reduce phases generally run recursively and the input for each run >> may contain both values from preceding map phase as well as output from >> previous iterations of the reduce phase. In order for the reduce phase to >> behave correctly you will need to distinguish between the different types of >> input records in your reduce function. >> >> Best regards, >> >> Christian >> >> >> >> >> On 6 Apr 2013, at 19:09, Kartik Thakore <kthak...@aimed.cc> wrote: >> >>> Hello, >>> >>> I recently setup a test cluster to try to do a tech demo web application on. >>> >>> I have been having some weirdness with the map reduce functionality. >>> >>> My database is here: >>> >>> http://aimed.cc:8098/riak/rekon/go#/buckets/test_rand_docs >>> >>> The cluster has 5 nodes >>> ulimit 4096 >>> >>> This is Riak 1.3.0 release on Debian with 663 of free memory. >>> >>> I am running this map reduce: >>> >>> curl -X POST -H "content-type: application/json" \ >>> http://aimed.cc:8098/mapred --data @-<<\EOF >>> {"inputs": "test_rand_docs", >>> "query":[{"map":{"language":"javascript","source":" >>> function (v) { >>> var r = {}; >>> var data = JSON.parse(v.values[0].data); >>> r.data = data; >>> r.key = v.key; >>> return [ r ]; >>> } >>> "}},{"reduce":{"language":"javascript","source":" >>> function (v) { >>> var r = {}; >>> for( var i in v ) >>> { >>> var doc = v[i]; >>> if( doc['data'] !== undefined) { >>> var age = doc['data']['age_int']; >>> if ( age !== undefined && age > 10 && age <25 ){ >>> r[doc['key']] = doc['data']; >>> } >>> } >>> } >>> return [ r ]; >>> >>> } >>> "}}] >>> >>> >>> my result is randomly: >>> >>> [{"9DYMGV0B6Jdn5DivoTExiqyDYUC":{"age_int":24},"JQYUs2onC822EOzMaToz71j77e":{"age_int":18},"AcrUwotAdYaV5zitaMylnUgYsWY":{"age_int":24}}] >>> >>> >>> or >>> >>> [{"LYJpg97ZA5qjZTTv2cfavmRgxLb":{"age_int":11}}] >>> >>> but it is clear with this: >>> >>> http://aimed.cc:8098/solr/test_rand_docs/select?q=age_int:[10%20TO%2025] >>> >>> >>> that there are 134 records .... >>> >>> so what is going on? >>> >>> >>> Is it low memory? Or that it is on a XEN machine (Linode)? Is there a >>> scaleable memory server vendor (AWS or w/e) I should consider? >>> >>> Thanks, >>> Kartik Thakore >>> >>> >>> _______________________________________________ >>> riak-users mailing list >>> riak-users@lists.basho.com >>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> >>
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com