I modified the pb client code to print the elapsed time when it receives the first map-reduce result (in the function wait_for_mapred), and tried to run a more apples-to-apples comparison of the same map-reduce job via the local client and the pb client connecting to 127.0.0.1.
In a nutshell, the majority of the extra delay from the pb client occurs before it receives the first result (e.g., 6 out of 7 seconds). (It's still troubling that it takes another second to receive the rest of the results when reading from localhost, but maybe I can avoid that by calling a different reduce phase that somehow packages my results into a single message?) Next, it seems like I should try instrumenting the server side of the protobufs interface with timing...any pointers on the best place(s) to do that would be appreciated. Here is the function I used with the pb client to localhost, and the results: TP = fun (Bucket, Query) -> {ok, Pid} = riakc_pb_socket:start_link("127.0.0.1", 8087), Inputs = {modfun, riak_search, mapred_search, [Bucket, Query]}, Phases = [ {reduce, {modfun, riak_kv_mapreduce, reduce_identity}, [{reduce_phase_only_1, true}], true} ], {MuSec, {ok, [{0, Results}]}} = timer:tc (fun () -> riakc_pb_socket:mapred(Pid, Inputs, Phases) end), riakc_pb_socket:stop (Pid), io:format(user, "~b results, ~f sec~n", [length(Results), MuSec/1000000]) end. 3> TP(Bucket,<<"full_text:flower">>). first M-R result in 5848490 musec 14378 results, 7.235226 sec ok 4> TP(Bucket,<<"full_text:flower">>). first M-R result in 5978044 musec 14378 results, 7.423933 sec ok 5> TP(Bucket,<<"full_text:flower">>). first M-R result in 1219112 musec 14378 results, 2.563079 sec ok 6> TP(Bucket,<<"full_text:flower">>). first M-R result in 6017087 musec 14378 results, 7.326352 sec ok 7> TP(Bucket,<<"full_text:green">>). first M-R result in 9546654 musec 24046 results, 13.561474 sec ok 8> TP(Bucket,<<"full_text:green">>). first M-R result in 1878109 musec 24047 results, 5.188646 sec ok 9> 9> TP(Bucket,<<"full_text:green">>). first M-R result in 9724651 musec 24047 results, 12.692941 sec ok 10> TP(Bucket,<<"full_text:green">>). first M-R result in 9627592 musec 24047 results, 12.769256 sec ok And here is the function I used for the local client, and the results: TL = fun (Bucket, Query) -> {ok, Client} = riak:local_client(), Inputs = {modfun, riak_search, mapred_search, [Bucket, Query]}, Phases = [ {reduce, {modfun, riak_kv_mapreduce, reduce_identity}, [{reduce_phase_only_1, true}], true} ], {MuSec, {ok, Results}} = timer:tc(fun () -> Client:mapred(Inputs, Phases) end), io:format(user, "~b results, ~f sec~n", [length(Results), MuSec/1000000]) end. (riak@10.174.223.37)27> TL(Bucket,<<"full_text:flower">>). 14340 results, 0.106422 sec ok (riak@10.174.223.37)28> TL(Bucket,<<"full_text:flower">>). 14340 results, 0.102255 sec ok (riak@10.174.223.37)29> TL(Bucket,<<"full_text:flower">>). 14340 results, 0.099877 sec ok (riak@10.174.223.37)30> TL(Bucket,<<"full_text:blue">>). 25017 results, 0.210381 sec ok (riak@10.174.223.37)31> TL(Bucket,<<"full_text:blue">>). 25017 results, 0.200112 sec ok (riak@10.174.223.37)32> TL(Bucket,<<"full_text:blue">>). 25018 results, 0.207553 sec ok (riak@10.174.223.37)33> TL(Bucket,<<"full_text:green">>). 23945 results, 0.173061 sec ok (riak@10.174.223.37)34> TL(Bucket,<<"full_text:green">>). 23945 results, 0.191595 sec ok (riak@10.174.223.37)35> TL(Bucket,<<"full_text:green">>). 23945 results, 0.177821 sec ok Mike On Mon, Apr 2, 2012 at 8:46 AM, Michael Radford <m...@blorf.com> wrote: > Hi Ryan, > > This is getting interesting: the same queries when executed using > local clients from 'riak attach' are taking only 100-250 ms. > > However, I just tried the same test I was running remotely on Saturday > on one of the machines in the Riak cluster, using the protobufs client > to connect to 127.0.0.1, and it's still taking 6-7 seconds per query. > (Still with an occasional dip down to 2-3 seconds). > > These machines are on Amazon EC2, so I have little control over the > network layout. But that includes the 4 Riak boxes, so if their > inter-node communication was suffering from similar issues I would > have expected to see it affecting other map-reduce queries. (We're > running lots of multi-key lookups via map-reduce that return thousands > of objects in a few ms, much larger than the keys returned from these > searches.) > > And again, there is a huge difference between the same query using the > local client (200 ms), and using the protobufs client from the exact > same Riak machine connecting to localhost (6-7 sec but occasionally > 2-3 sec). > > Mike > > On Mon, Apr 2, 2012 at 7:44 AM, Ryan Zezeski <rzeze...@basho.com> wrote: >> Hi Michael, you'll find my responses inline... >> >> On Sat, Mar 31, 2012 at 5:04 PM, Michael Radford <m...@blorf.com> wrote: >>> >>> I'm seeing very slow performance from Riak search even when querying >>> single terms, and I'd appreciate any advice on how to get insight into >>> where the time is going. >>> >>> Right now, I'm using this function to time queries with the Erlang pb >>> client: >>> >>> TS = >>> fun (Pid, Bucket, Query) -> >>> T0 = now(), >>> {ok, Results} = riakc_pb_socket:search(Pid, Bucket, Query), >>> MuSec = timer:now_diff(now(), T0), >>> io:format(user, "~b results, ~f sec~n", [length(Results), >>> MuSec/1000000]) >>> end. >> >> >> Just an FYI, you should checkout `timer:tc`. >>> >>> >>> The bucket I'm querying currently has ~300k keys total (each 16 >>> bytes). (The whole cluster has maybe 1.5M entries in about a dozen >>> buckets. It's running 1.0.2, 4 nodes on 4 8-core c1.xlarge EC2 >>> instances.) >>> >>> For single-term queries that return 10k+ keys, I'm routinely seeing >>> times above 6 seconds to run the above function. Intermittently, >>> however, I'll see the same queries take only 2 seconds: >>> >>> > TS(Pid,Bucket,<<"full_text:flower">>). >>> 12574 results, 6.094149 sec >>> ok >>> > TS(Pid,Bucket,<<"full_text:flower">>). >>> 12574 results, 1.938894 sec >>> ok >>> > TS(Pid,Bucket,<<"full_text:flower">>). >>> 12574 results, 1.981492 sec >>> ok >>> > TS(Pid,Bucket,<<"full_text:flower">>). >>> 12574 results, 6.120589 sec >>> ok >>> >>> > TS(Pid,Bucket,<<"full_text:red">>). >>> 13461 results, 6.572473 sec >>> ok >>> > TS(Pid,Bucket,<<"full_text:red">>). >>> 13461 results, 6.626049 sec >>> ok >>> > TS(Pid,Bucket,<<"full_text:red">>). >>> 13461 results, 2.155847 sec >>> ok >>> >>> Queries with fewer results are still slow, but not quite as slow as 6 >>> seconds: >>> >>> > TS(Pid,Bucket,<<"full_text:ring">>). >>> 6446 results, 2.831806 sec >>> ok >>> > TS(Pid,Bucket,<<"full_text:ring">>). >>> 6446 results, 3.037162 sec >>> ok >>> > TS(Pid,Bucket,<<"full_text:ring">>). >>> 6447 results, 0.780944 sec >>> ok >>> >>> And queries with no matches only take a few milliseconds: >>> >>> > TS(Pid,Bucket,<<"full_text:blorf">>). >>> 0 results, 0.003269 sec >>> ok >>> >>> During the slow queries, none of the 4 machines seems to be fully >>> taxing even one cpu, or doing almost any disk i/o. >> >> >> What does intra/inter network look like? >> >>> >>> >>> As far as I can tell from looking at the riak_kv/riak_search source, >>> my query should only be hitting the index and streaming back the keys, >>> not trying to read every document from disk or sort by score. Is that >>> correct? >> >> >> It will not read the documents at all but it will sort on score. Currently >> there is no way to disable sorting. >> >>> >>> >>> Assuming that's the case, I can't imagine why it takes so long to look >>> up 10k keys from the index for a single term, or why the times seem to >>> be bimodal (which seems like a big clue). Any pointers welcome! >> >> >> Where is your client sitting in regards to your cluster? Is it in the local >> network? Could you try attaching to one of your riak nodes, running the >> query there and compare results? >> >> e.g. >> >> riak attach >> >>> search:search(Bucket, Query). >> >> -Ryan >> _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com