Hi OJ, The do_prereduce parameter makes it possible to have the first iteration of the reduce phase execute where the preceding map phase generated output. This can, as in the example I provided, be used to reduce the amount of data that needs to be sent across the network. This is described in greater detail here: http://docs.basho.com/riak/latest/references/appendices/MapReduce-Implementation/
As it is possible to set it to be enabled by default in the app.config, it should be fine to always specify it for reduce phases preceded by a map phase. Best regards, Christian On 14 Feb 2013, at 12:21, OJ Reeves <o...@buffered.io> wrote: > Chris, > > I've never heard of do_prereduce before. What kind of effect does this have? > That is, if someone were to use it all the time, regardless of the amount of > data being returned, would this be a bad thing? > > Thanks. > OJ > > On Thu, Feb 14, 2013 at 6:19 PM, Christian Dahlqvist <christ...@basho.com> > wrote: > Hi, > > For buckets with a significant number of records, it makes a lot of sense to > run the example I provided with 'do_prereduce' enabled as it will result in > considerably less data being sent between the nodes. This can be enabled as > follows: > > curl -XPOST http://localhost:8098/mapred > -H 'Content-Type: application/json' > -d '{"inputs":{ > "bucket":"goog", > "index":"$bucket", > "key":"goog" > }, > "query":[{"reduce":{"language":"erlang", > "module":"riak_kv_mapreduce", > "function":"reduce_count_inputs", > "arg":{"do_prereduce":true}}}]}' > > Best regards, > > Christian > > > On 14 Feb 2013, at 08:01, Christian Dahlqvist <christ...@basho.com> wrote: > >> Hi Jeremiah, >> >> It does indeed not seem to be documented on the main docs site, and I will >> try to correct this. The only place I have found it described is on the wiki >> for the Ruby client >> (https://github.com/basho/riak-ruby-client/wiki/Secondary-Indexes). >> >> Below is also an example of a simple mapreduce job that shows how to count >> the number of records in the 'goog' bucket based on the $bucket secondary >> index: >> >> curl -XPOST http://localhost:8098/mapred >> -H 'Content-Type: application/json' >> -d '{"inputs":{ >> "bucket":"goof", >> "index":"$bucket", >> "key":"goof" >> }, >> "query":[{"reduce":{"language":"erlang", >> "module":"riak_kv_mapreduce", >> "function":"reduce_count_inputs"}}]}' >> >> I hope this helps. >> >> Best regards, >> >> Christian >> >> >> On 13 Feb 2013, at 18:12, Jeremiah Peschka <jeremiah.pesc...@gmail.com> >> wrote: >> >>> Is this documented anywhere on the docs.basho.com site? >>> >>> Searching for $bucket produces search results just for "bucket" and Google >>> says "No results found for site:docs.basho.com $bucket." >>> >>> --- >>> Jeremiah Peschka - Founder, Brent Ozar Unlimited >>> MCITP: SQL Server 2008, MVP >>> Cloudera Certified Developer for Apache Hadoop >>> >>> >>> On Wed, Feb 13, 2013 at 10:08 AM, Christian Dahlqvist <christ...@basho.com> >>> wrote: >>> Hi, >>> >>> In addition to the $key index, there is also a $bucket index available by >>> default. This contains the name of the bucket, and can be used to get all >>> keys in a specific bucket. >>> >>> Best regards, >>> >>> Christian >>> >> > > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > > > -- > > OJ Reeves > +61 431 952 586 > http://buffered.io/
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com