I am still experimenting with this issue (and I'll write more detailed report on that) but here is my findings so far:
1. This behavior appears only when large number of keys is returned by secondary index filter. I am able to successefully execute the same map-reduce set on functions if I ajust secondary index filter to return a smaller subset. 2. When i switched to erlang map function, and kept original javascript-written reduce function, I was still getting the same behavior, switching to very simple erlang-based map-reduce set of functions works and I am able to complete the query under 9 seconds. Here's the set of functions I used in my testing: {"map": {"language":"erlang","module":"riak_kv_mapreduce","function":"map_object_value","arg":"filter_notfound"} }, {"reduce": {"language":"erlang", "module":"riak_kv_mapreduce","function":"reduce_sort","arg":"filter_notfound"}} so I believe the bottleneck that kills queries is Javascript VM overhead. On Mon, Oct 10, 2011 at 6:57 AM, Jim Adler <jim.ad...@comcast.net> wrote: > I'm seeing the same behavior and logs on a bucket with about 8M keys. > Fyodor, any luck with any of Bryan's suggestions? > Jim > > ________________________________ > From: "Bryan Fink" <br...@basho.com> > To: "Fyodor Yarochkin" <fyodo...@armorize.com> > Cc: riak-users@lists.basho.com > Sent: Friday, October 7, 2011 6:06:15 AM > Subject: Re: Riak 1.0 pre2 legacy_keylisting crash > > On Fri, Oct 7, 2011 at 1:50 AM, Fyodor Yarochkin <fyodo...@armorize.com> > wrote: >> Here's one of the queries that consistently generates series of >> 'fitting_died' log messages: >> >> { >> "inputs":{ >> "bucket":"test", >> "index":"integer_int", > … >> }, >> "query":[ >> {"map":{"language":"javascript", > … >> }, >> {"reduce":{"language":"javascript", > … >> {"reduce":{"language":"javascript", > … >> ],"timeout": 9000 >> } >> >> produces over hundred of " "Supervisor riak_pipe_vnode_worker_sup had >> child at module undefined at <0.28835.0> exit with reason fitting_died >> in context child_terminated" entries in log file and returns 'timeout' > > My interpretation of your report is that 9 seconds is not long enough > to finish your MapReduce query. I'll explain how I arrived at this > interpretation: > > The log message you're seeing says that many processes that > riak_pipe_vnode_worker_sup was monitor exited abnormally. That > supervisor only monitors Riak Pipe worker processes, the processes > that do the work for Riak 1.0's MapReduce phases. > > The reason those workers gave for exiting abnormally was > 'fitting_died'. This means that the pipeline they were working for > closed before they were finished with their work. > > The result your received was 'timeout'. The way timeouts work in > Riak-Pipe-based MapReduce is that a timer triggers a message at the > given time, causing a monitoring process to cease waiting for results, > tear down the pipe, and return a timeout message to your client. > > The "tear down the pipe" step in the timeout process is what causes > all of those 'fitting_died' message you see. They're normal, and are > intended to aid in analysis like the above. > > With that behind us, though, the question remains: why isn't 9 seconds > long enough to finish this query? To figure that out, I'd start from > the beginning: > > 1. Is 9 seconds long enough to just finish the index query (using the > index API outside of MapReduce)? If not, then the next people to jump > in with help here will want to know more about the types, sizes, and > counts of data you have indexed. > > 2. Assuming the bare index query finishes fast enough, is 9 seconds > long enough to get through just the index and map phase (no reduce > phases)? If not, it's likely that either it takes longer than 9 > seconds to pull every object matching your index query out of KV, or > that contention for Javascript VMs prohibits the throughput needed. > > 2a. Try switching to an Erlang map phase. > {"language":"erlang","module":"riak_kv_mapreduce","function":"map_object_value","arg":"filter_notfound"} > should do exactly what your Javascript function does, without > contending for a JS VM. > > 2b. Try increasing the number of JS VMs available for map phases. In > your app.config, find the 'map_js_vm_count' setting, and increase it. > > 3. Assuming just the map phase also makes it through, is 9 seconds > long enough to get through just the index, map, and first reduce phase > (leave off the second)? Your first reduce phase looks like it doesn't > do anything … is it needed? Try removing it. > > 4. If you get all the way to the final phase before hitting the 9 > second timeout, then it's may be that the re-reduce behavior of Riak > KV's MapReduce causes your function to be too expensive. This will be > especially true if you expect that phase to receive thousands of > inputs. A sort function such as yours probably doesn't benefit from > re-reduce, so I would recommend disabling it by adding > "arg":{"reduce_phase_only_1":true} to that reduce phase's > specification. With that in place, your function should be evaluated > only once, with all the inputs it will receive. This may still fail > because of the time it can take to encode/decode a large set of > inputs/outputs to/from JSON, but doing it only once may be enough to > get you finished. > > Hope that helps, > Bryan > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com