Hi Mattias,

When Riak executes a MapReduce job, it will determine a covering set of 
vnodes/partitions that will handle the processing. The vnodes selected will 
vary between runs, and if the partitions do not all hold the same data, there 
may be differences in the results between consecutive runs.

- How many records do you have in the bucket? 

- How large portion of the results typically differs between runs? 

- What does the mapreduce jobs look like?

Best regards,

Christian 



On 12 Apr 2013, at 11:05, Mattias Sjölinder <matt...@sjolinder.se> wrote:

> Thanks for you response Christian!
> 
> We are having AAE enabled in our cluster. The thing that bothers me most is 
> that even if I get the expected dataset once in my MapReduce, a consecutive 
> identically request is often returning a subset of the result even if no 
> changes have been done to entire bucket. The same thing seems to happen for 
> all our MapReduce queries. The bucket in the example have both allow_mult and 
> last_write_wins set to false.
> 
> Regards
> Mattias
> 
> 
> 2013/4/12 Christian Dahlqvist <christ...@basho.com>
> Hi Mattias,
> 
> MapReduce in Riak executes based on the data in a single partition and does, 
> for efficiency reasons, not perform a quorum read (which greatly reduces the 
> required amount of network traffic). As Riak is eventually consistent, it is 
> possible that all partitions do not hold exactly the same data or version of 
> the data at any point in time. What you are seeing could very well be a 
> result of all replicas of some data not being in sync across all partitions 
> holding a copy.
> 
> This would however be corrected either through read-repair or AAE (Active 
> Anti-Entropy) if you have this enabled. If you were to perform a GET on a key 
> that is missing, triggering read-repair, I would expect it to consistently 
> show up in the results from that point on, at least until it is updated again.
> 
> Best regards,
> 
> Christian
> 
> 
> 
> On 12 Apr 2013, at 08:13, Mattias Sjölinder <matt...@sjolinder.se> wrote:
> 
>> Hi
>> 
>> I struggling to get a grip around MapReduce and why it is sometimes 
>> returning only a subset of what is expected. Is it the nodes processing the 
>> map phase that after a specific time returning the found matches so far? I 
>> would rather have it returning timeout instead of a subset of the actual 
>> match.
>> 
>> An example is this simple MapReduce:
>> 
>> {
>>     "inputs":{
>>         "bucket":"som-bucket",
>>         "index":"userid_bin",
>>         "key":"18481123123"
>>     },
>>     "query":[
>>         {
>>             "map":{
>>                 "language":"javascript",
>>                 "name":"Riak.mapValuesJson",
>>                 "keep":true
>>             }
>>         }
>>     ]
>> }
>> 
>> 
>> Any thoughts?
>> 
>> Regards
>> Mattias
>> 
>> _______________________________________________
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
> 

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to