I am still experimenting with this issue (and I'll write more detailed
report on that) but here is my findings so far:

1. This behavior appears only when large number of keys is returned by
secondary index filter. I am able to successefully execute the same
map-reduce set on functions if I ajust secondary index filter to
return a smaller subset.

2. When i switched to erlang map function, and kept original
javascript-written reduce function, I was still getting the same
behavior, switching to very simple erlang-based map-reduce set of
functions works and I am able to complete the query under 9 seconds.

Here's the set of functions I used in my testing:

    {"map": 
{"language":"erlang","module":"riak_kv_mapreduce","function":"map_object_value","arg":"filter_notfound"}
    },
       {"reduce": {"language":"erlang",
"module":"riak_kv_mapreduce","function":"reduce_sort","arg":"filter_notfound"}}


so I believe the bottleneck that kills queries is Javascript VM overhead.

On Mon, Oct 10, 2011 at 6:57 AM, Jim Adler <jim.ad...@comcast.net> wrote:
> I'm seeing the same behavior and logs on a bucket with about 8M keys.
> Fyodor, any luck with any of Bryan's suggestions?
> Jim
>
> ________________________________
> From: "Bryan Fink" <br...@basho.com>
> To: "Fyodor Yarochkin" <fyodo...@armorize.com>
> Cc: riak-users@lists.basho.com
> Sent: Friday, October 7, 2011 6:06:15 AM
> Subject: Re: Riak 1.0 pre2 legacy_keylisting crash
>
> On Fri, Oct 7, 2011 at 1:50 AM, Fyodor Yarochkin <fyodo...@armorize.com>
> wrote:
>> Here's one of the queries that consistently generates series of
>> 'fitting_died' log messages:
>>
>> {
>>   "inputs":{
>>       "bucket":"test",
>>       "index":"integer_int",
> …
>>   },
>>   "query":[
>>    {"map":{"language":"javascript",
> …
>>    },
>>    {"reduce":{"language":"javascript",
> …
>>  {"reduce":{"language":"javascript",
> …
>>    ],"timeout": 9000
>> }
>>
>> produces over hundred of " "Supervisor riak_pipe_vnode_worker_sup had
>> child at module undefined at <0.28835.0> exit with reason fitting_died
>> in context child_terminated" entries in log file and returns 'timeout'
>
> My interpretation of your report is that 9 seconds is not long enough
> to finish your MapReduce query.  I'll explain how I arrived at this
> interpretation:
>
> The log message you're seeing says that many processes that
> riak_pipe_vnode_worker_sup was monitor exited abnormally.  That
> supervisor only monitors Riak Pipe worker processes, the processes
> that do the work for Riak 1.0's MapReduce phases.
>
> The reason those workers gave for exiting abnormally was
> 'fitting_died'.  This means that the pipeline they were working for
> closed before they were finished with their work.
>
> The result your received was 'timeout'.  The way timeouts work in
> Riak-Pipe-based MapReduce is that a timer triggers a message at the
> given time, causing a monitoring process to cease waiting for results,
> tear down the pipe, and return a timeout message to your client.
>
> The "tear down the pipe" step in the timeout process is what causes
> all of those 'fitting_died' message you see.  They're normal, and are
> intended to aid in analysis like the above.
>
> With that behind us, though, the question remains: why isn't 9 seconds
> long enough to finish this query?  To figure that out, I'd start from
> the beginning:
>
> 1. Is 9 seconds long enough to just finish the index query (using the
> index API outside of MapReduce)?  If not, then the next people to jump
> in with help here will want to know more about the types, sizes, and
> counts of data you have indexed.
>
> 2. Assuming the bare index query finishes fast enough, is 9 seconds
> long enough to get through just the index and map phase (no reduce
> phases)?  If not, it's likely that either it takes longer than 9
> seconds to pull every object matching your index query out of KV, or
> that contention for Javascript VMs prohibits the throughput needed.
>
> 2a. Try switching to an Erlang map phase.
> {"language":"erlang","module":"riak_kv_mapreduce","function":"map_object_value","arg":"filter_notfound"}
> should do exactly what your Javascript function does, without
> contending for a JS VM.
>
> 2b. Try increasing the number of JS VMs available for map phases.  In
> your app.config, find the 'map_js_vm_count' setting, and increase it.
>
> 3. Assuming just the map phase also makes it through, is 9 seconds
> long enough to get through just the index, map, and first reduce phase
> (leave off the second)?  Your first reduce phase looks like it doesn't
> do anything … is it needed?  Try removing it.
>
> 4. If you get all the way to the final phase before hitting the 9
> second timeout, then it's may be that the re-reduce behavior of Riak
> KV's MapReduce causes your function to be too expensive.  This will be
> especially true if you expect that phase to receive thousands of
> inputs.  A sort function such as yours probably doesn't benefit from
> re-reduce, so I would recommend disabling it by adding
> "arg":{"reduce_phase_only_1":true} to that reduce phase's
> specification.  With that in place, your function should be evaluated
> only once, with all the inputs it will receive.  This may still fail
> because of the time it can take to encode/decode a large set of
> inputs/outputs to/from JSON, but doing it only once may be enough to
> get you finished.
>
> Hope that helps,
> Bryan
>
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to