On Fri, Oct 7, 2011 at 1:50 AM, Fyodor Yarochkin <fyodo...@armorize.com> wrote: > Here's one of the queries that consistently generates series of > 'fitting_died' log messages: > > { > "inputs":{ > "bucket":"test", > "index":"integer_int", … > }, > "query":[ > {"map":{"language":"javascript", … > }, > {"reduce":{"language":"javascript", … > {"reduce":{"language":"javascript", … > ],"timeout": 9000 > } > > produces over hundred of " "Supervisor riak_pipe_vnode_worker_sup had > child at module undefined at <0.28835.0> exit with reason fitting_died > in context child_terminated" entries in log file and returns 'timeout'
My interpretation of your report is that 9 seconds is not long enough to finish your MapReduce query. I'll explain how I arrived at this interpretation: The log message you're seeing says that many processes that riak_pipe_vnode_worker_sup was monitor exited abnormally. That supervisor only monitors Riak Pipe worker processes, the processes that do the work for Riak 1.0's MapReduce phases. The reason those workers gave for exiting abnormally was 'fitting_died'. This means that the pipeline they were working for closed before they were finished with their work. The result your received was 'timeout'. The way timeouts work in Riak-Pipe-based MapReduce is that a timer triggers a message at the given time, causing a monitoring process to cease waiting for results, tear down the pipe, and return a timeout message to your client. The "tear down the pipe" step in the timeout process is what causes all of those 'fitting_died' message you see. They're normal, and are intended to aid in analysis like the above. With that behind us, though, the question remains: why isn't 9 seconds long enough to finish this query? To figure that out, I'd start from the beginning: 1. Is 9 seconds long enough to just finish the index query (using the index API outside of MapReduce)? If not, then the next people to jump in with help here will want to know more about the types, sizes, and counts of data you have indexed. 2. Assuming the bare index query finishes fast enough, is 9 seconds long enough to get through just the index and map phase (no reduce phases)? If not, it's likely that either it takes longer than 9 seconds to pull every object matching your index query out of KV, or that contention for Javascript VMs prohibits the throughput needed. 2a. Try switching to an Erlang map phase. {"language":"erlang","module":"riak_kv_mapreduce","function":"map_object_value","arg":"filter_notfound"} should do exactly what your Javascript function does, without contending for a JS VM. 2b. Try increasing the number of JS VMs available for map phases. In your app.config, find the 'map_js_vm_count' setting, and increase it. 3. Assuming just the map phase also makes it through, is 9 seconds long enough to get through just the index, map, and first reduce phase (leave off the second)? Your first reduce phase looks like it doesn't do anything … is it needed? Try removing it. 4. If you get all the way to the final phase before hitting the 9 second timeout, then it's may be that the re-reduce behavior of Riak KV's MapReduce causes your function to be too expensive. This will be especially true if you expect that phase to receive thousands of inputs. A sort function such as yours probably doesn't benefit from re-reduce, so I would recommend disabling it by adding "arg":{"reduce_phase_only_1":true} to that reduce phase's specification. With that in place, your function should be evaluated only once, with all the inputs it will receive. This may still fail because of the time it can take to encode/decode a large set of inputs/outputs to/from JSON, but doing it only once may be enough to get you finished. Hope that helps, Bryan _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com