Re: Correct way to use pbc/mapreduce to do multiget where keys and bucket names are binary values?

Russell Brown Sat, 04 Jun 2011 02:39:37 -0700

On 3 Jun 2011, at 00:55, Jacques wrote:

> I'm horrible at character encoding but I don't think so once we use those 
> strings in the Map Reduce json object.   Unless we remove a bunch of 
> characters from any character sets, I believe we would choke the json parsing 
> on the riak side since ultimately the job would be read as a utf8 string.


Good point. I didn't think far enough downstream.

> <<snip>>
> 
>> I've noticed that there is a secondary erlang format that can be passed for 
>> map reduce jobs, must I use that?  If so, does anyone have an example of a 
>> generating one of these from within Java?
> 
> The java PB client doesn't currently support the application/x-erlang-binary 
> content-type for map/reduce jobs. I think that only the erlang pb client does.
> 
> I understand that the current client doesn't support this.  I was more 
> thinking that of using Jinterface to generate the erlang version of the map 
> reduce job.  I haven't worked with it and really don't have any knowledge 
> around erlang types but figured it might be possible. I guess the question 
> was whether anybody thought this was feasible.   

I think that is feasible. Use the OtpOutputStream and OtpInputStream to 
encode/decode. I'm playing with Jinterface right now to write a basho_bench 
driver for the Java client so I have that head on. I'll give it some time this 
weekend.

> 
> <<snip>> 
> 
> I was asking on the Riak side. Our real need is multi-get.  We're looking for 
> regular pulls of 20 random bucket/key values.  We need to minimize the 
> latency each of the pulls.  I know that we could split this into 20 separate 
> sockets. (ick... This gets ugly when we're talking about many simultaneous 
> pulls from multiple servers.  I'd rather not create pools of 100 sockets per 
> requesting server if I could avoid it.)   I was wondering whether if we sent 
> four requests in a row to riak on the same socket, whether it would work on 
> them all at once or serially.  I'm guessing serially.

Serially.

> 
> I'm sure I can make it work utilizing the path you referenced above and 
> base85 or base64.  It is a simple return all values map job.  Based on what 
> you're saying it seems like we have three options:
> 
> - Use a string based binary encoding for our bucket/key names (e.g. base64)
> - Use a truck load of sockets.  (How will Riak perform if we are generating 
> let's say 200-500 connections per riak node?)
> - Try to figure out encoding an erlang version of the map reduce job using 
> JInterface (assuming the MapReduce api supports binary buckets and keys if 
> using an erlang content-type).
> 
> Am I missing any options?  I'm inclined to see if option 3 is available 
> unless someone says that is a fool's dream.

I like the sound of option 3 also. I'll have a look at it this weekend and get 
back to you.

> 
> Thanks,
> Jacques
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Correct way to use pbc/mapreduce to do multiget where keys and bucket names are binary values?

Reply via email to