Aha, I just noticed that the native erlang client is still using luke_flow to implement its map-reduce, rather than riak_pipe. On some level, this must be the reason for the differing behavior...either a bug in riak_pipe, or a bug in the usage of riak_pipe somewhere in the chain?
Like, maybe someone is forgetting to send end-of-input and causing a 5-second timeout to occur, something like that? (Just wild speculation.) Mike On Mon, Apr 2, 2012 at 2:39 PM, Ryan Zezeski <rzeze...@basho.com> wrote: > Michael, > > These are some interesting observations. Thank you for digging into this > more. At first glance it appears that it has something to do with the PB > client/server. If you are feeling adventurous might I suggest using fprof > to analyze where the time is being spent. I wrote up a gist on how you can > do that here: > > https://gist.github.com/2287383 > > I ran this analysis myself on a much smaller dataset. A quick glance at the > results shows most of the time being spent in suspend, encode/decode (which > I suspected), and queuing data in riak_pipe. I'm really curious to see what > your analysis looks like. If you can produce one I'm happy to help analyze. > Thanks again for your detailed analysis thus far. > > -Ryan > > > > On Mon, Apr 2, 2012 at 2:33 PM, Michael Radford <m...@blorf.com> wrote: >> >> I modified the pb client code to print the elapsed time when it >> receives the first map-reduce result (in the function >> wait_for_mapred), and tried to run a more apples-to-apples comparison >> of the same map-reduce job via the local client and the pb client >> connecting to 127.0.0.1. >> >> In a nutshell, the majority of the extra delay from the pb client >> occurs before it receives the first result (e.g., 6 out of 7 seconds). >> (It's still troubling that it takes another second to receive the rest >> of the results when reading from localhost, but maybe I can avoid that >> by calling a different reduce phase that somehow packages my results >> into a single message?) >> >> Next, it seems like I should try instrumenting the server side of the >> protobufs interface with timing...any pointers on the best place(s) to >> do that would be appreciated. >> >> Here is the function I used with the pb client to localhost, and the >> results: >> TP = >> fun (Bucket, Query) -> >> {ok, Pid} = riakc_pb_socket:start_link("127.0.0.1", 8087), >> Inputs = {modfun, riak_search, mapred_search, [Bucket, Query]}, >> Phases = >> [ {reduce, >> {modfun, riak_kv_mapreduce, reduce_identity}, >> [{reduce_phase_only_1, true}], >> true} ], >> {MuSec, {ok, [{0, Results}]}} = >> timer:tc (fun () -> riakc_pb_socket:mapred(Pid, Inputs, Phases) end), >> riakc_pb_socket:stop (Pid), >> io:format(user, "~b results, ~f sec~n", [length(Results), >> MuSec/1000000]) >> end. >> >> 3> TP(Bucket,<<"full_text:flower">>). >> first M-R result in 5848490 musec >> 14378 results, 7.235226 sec >> ok >> 4> TP(Bucket,<<"full_text:flower">>). >> first M-R result in 5978044 musec >> 14378 results, 7.423933 sec >> ok >> 5> TP(Bucket,<<"full_text:flower">>). >> first M-R result in 1219112 musec >> 14378 results, 2.563079 sec >> ok >> 6> TP(Bucket,<<"full_text:flower">>). >> first M-R result in 6017087 musec >> 14378 results, 7.326352 sec >> ok >> 7> TP(Bucket,<<"full_text:green">>). >> first M-R result in 9546654 musec >> 24046 results, 13.561474 sec >> ok >> 8> TP(Bucket,<<"full_text:green">>). >> first M-R result in 1878109 musec >> 24047 results, 5.188646 sec >> ok >> 9> >> 9> TP(Bucket,<<"full_text:green">>). >> first M-R result in 9724651 musec >> 24047 results, 12.692941 sec >> ok >> 10> TP(Bucket,<<"full_text:green">>). >> first M-R result in 9627592 musec >> 24047 results, 12.769256 sec >> ok >> >> And here is the function I used for the local client, and the results: >> TL = >> fun (Bucket, Query) -> >> {ok, Client} = riak:local_client(), >> Inputs = {modfun, riak_search, mapred_search, [Bucket, Query]}, >> Phases = >> [ {reduce, >> {modfun, riak_kv_mapreduce, reduce_identity}, >> [{reduce_phase_only_1, true}], >> true} ], >> {MuSec, {ok, Results}} = >> timer:tc(fun () -> Client:mapred(Inputs, Phases) end), >> io:format(user, "~b results, ~f sec~n", [length(Results), >> MuSec/1000000]) >> end. >> >> (riak@10.174.223.37)27> TL(Bucket,<<"full_text:flower">>). >> 14340 results, 0.106422 sec >> ok >> (riak@10.174.223.37)28> TL(Bucket,<<"full_text:flower">>). >> 14340 results, 0.102255 sec >> ok >> (riak@10.174.223.37)29> TL(Bucket,<<"full_text:flower">>). >> 14340 results, 0.099877 sec >> ok >> (riak@10.174.223.37)30> TL(Bucket,<<"full_text:blue">>). >> 25017 results, 0.210381 sec >> ok >> (riak@10.174.223.37)31> TL(Bucket,<<"full_text:blue">>). >> 25017 results, 0.200112 sec >> ok >> (riak@10.174.223.37)32> TL(Bucket,<<"full_text:blue">>). >> 25018 results, 0.207553 sec >> ok >> (riak@10.174.223.37)33> TL(Bucket,<<"full_text:green">>). >> 23945 results, 0.173061 sec >> ok >> (riak@10.174.223.37)34> TL(Bucket,<<"full_text:green">>). >> 23945 results, 0.191595 sec >> ok >> (riak@10.174.223.37)35> TL(Bucket,<<"full_text:green">>). >> 23945 results, 0.177821 sec >> ok >> >> Mike >> >> On Mon, Apr 2, 2012 at 8:46 AM, Michael Radford <m...@blorf.com> wrote: >> > Hi Ryan, >> > >> > This is getting interesting: the same queries when executed using >> > local clients from 'riak attach' are taking only 100-250 ms. >> > >> > However, I just tried the same test I was running remotely on Saturday >> > on one of the machines in the Riak cluster, using the protobufs client >> > to connect to 127.0.0.1, and it's still taking 6-7 seconds per query. >> > (Still with an occasional dip down to 2-3 seconds). >> > >> > These machines are on Amazon EC2, so I have little control over the >> > network layout. But that includes the 4 Riak boxes, so if their >> > inter-node communication was suffering from similar issues I would >> > have expected to see it affecting other map-reduce queries. (We're >> > running lots of multi-key lookups via map-reduce that return thousands >> > of objects in a few ms, much larger than the keys returned from these >> > searches.) >> > >> > And again, there is a huge difference between the same query using the >> > local client (200 ms), and using the protobufs client from the exact >> > same Riak machine connecting to localhost (6-7 sec but occasionally >> > 2-3 sec). >> > >> > Mike >> > >> > On Mon, Apr 2, 2012 at 7:44 AM, Ryan Zezeski <rzeze...@basho.com> wrote: >> >> Hi Michael, you'll find my responses inline... >> >> >> >> On Sat, Mar 31, 2012 at 5:04 PM, Michael Radford <m...@blorf.com> >> >> wrote: >> >>> >> >>> I'm seeing very slow performance from Riak search even when querying >> >>> single terms, and I'd appreciate any advice on how to get insight into >> >>> where the time is going. >> >>> >> >>> Right now, I'm using this function to time queries with the Erlang pb >> >>> client: >> >>> >> >>> TS = >> >>> fun (Pid, Bucket, Query) -> >> >>> T0 = now(), >> >>> {ok, Results} = riakc_pb_socket:search(Pid, Bucket, Query), >> >>> MuSec = timer:now_diff(now(), T0), >> >>> io:format(user, "~b results, ~f sec~n", [length(Results), >> >>> MuSec/1000000]) >> >>> end. >> >> >> >> >> >> Just an FYI, you should checkout `timer:tc`. >> >>> >> >>> >> >>> The bucket I'm querying currently has ~300k keys total (each 16 >> >>> bytes). (The whole cluster has maybe 1.5M entries in about a dozen >> >>> buckets. It's running 1.0.2, 4 nodes on 4 8-core c1.xlarge EC2 >> >>> instances.) >> >>> >> >>> For single-term queries that return 10k+ keys, I'm routinely seeing >> >>> times above 6 seconds to run the above function. Intermittently, >> >>> however, I'll see the same queries take only 2 seconds: >> >>> >> >>> > TS(Pid,Bucket,<<"full_text:flower">>). >> >>> 12574 results, 6.094149 sec >> >>> ok >> >>> > TS(Pid,Bucket,<<"full_text:flower">>). >> >>> 12574 results, 1.938894 sec >> >>> ok >> >>> > TS(Pid,Bucket,<<"full_text:flower">>). >> >>> 12574 results, 1.981492 sec >> >>> ok >> >>> > TS(Pid,Bucket,<<"full_text:flower">>). >> >>> 12574 results, 6.120589 sec >> >>> ok >> >>> >> >>> > TS(Pid,Bucket,<<"full_text:red">>). >> >>> 13461 results, 6.572473 sec >> >>> ok >> >>> > TS(Pid,Bucket,<<"full_text:red">>). >> >>> 13461 results, 6.626049 sec >> >>> ok >> >>> > TS(Pid,Bucket,<<"full_text:red">>). >> >>> 13461 results, 2.155847 sec >> >>> ok >> >>> >> >>> Queries with fewer results are still slow, but not quite as slow as 6 >> >>> seconds: >> >>> >> >>> > TS(Pid,Bucket,<<"full_text:ring">>). >> >>> 6446 results, 2.831806 sec >> >>> ok >> >>> > TS(Pid,Bucket,<<"full_text:ring">>). >> >>> 6446 results, 3.037162 sec >> >>> ok >> >>> > TS(Pid,Bucket,<<"full_text:ring">>). >> >>> 6447 results, 0.780944 sec >> >>> ok >> >>> >> >>> And queries with no matches only take a few milliseconds: >> >>> >> >>> > TS(Pid,Bucket,<<"full_text:blorf">>). >> >>> 0 results, 0.003269 sec >> >>> ok >> >>> >> >>> During the slow queries, none of the 4 machines seems to be fully >> >>> taxing even one cpu, or doing almost any disk i/o. >> >> >> >> >> >> What does intra/inter network look like? >> >> >> >>> >> >>> >> >>> As far as I can tell from looking at the riak_kv/riak_search source, >> >>> my query should only be hitting the index and streaming back the keys, >> >>> not trying to read every document from disk or sort by score. Is that >> >>> correct? >> >> >> >> >> >> It will not read the documents at all but it will sort on score. >> >> Currently >> >> there is no way to disable sorting. >> >> >> >>> >> >>> >> >>> Assuming that's the case, I can't imagine why it takes so long to look >> >>> up 10k keys from the index for a single term, or why the times seem to >> >>> be bimodal (which seems like a big clue). Any pointers welcome! >> >> >> >> >> >> Where is your client sitting in regards to your cluster? Is it in the >> >> local >> >> network? Could you try attaching to one of your riak nodes, running >> >> the >> >> query there and compare results? >> >> >> >> e.g. >> >> >> >> riak attach >> >> >> >>> search:search(Bucket, Query). >> >> >> >> -Ryan >> >> > > _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com