Map Reduce Requirements
In order to run a map reduce query v.s. Riak, does the data need to be stored in JSON? If this isn't a requirement, then how would I run a query against data stored in a google protocol buffer format? Is there an example of this somewhere? Thanks! ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Fwd: Map Reduce Requirements
Forgot to reply-all... -- Forwarded message -- From: bill robertson Date: Mon, Aug 22, 2011 at 3:18 PM Subject: Re: Map Reduce Requirements To: Jeremiah Peschka That makes sense. Suppose I have a query called Q1. I would like to specify Q1 in Javascript. Assume that I can write an Erlang function called F that will translate the raw GPB bytes into the appropriate JSON for use by Q1. How would I hook F into the processing of Q1? I guess that the Javascript function be passed the GPB bytes in the reduce phase at which point I could call my translation function and operate on the JSON, and possibly pass on a structure containing the JSON and the GPB to the next phase. Does that make sense? Is it possible to invoke arbitrary Erlang functions within Javascript like this? If so, are there examples? Additionally, are secondary indexes meta-data? i.e. If I built some secondary indices, these are stored in some form internal to Riak, and therefore available for query regardless of the type of data its associated with. Is this correct? Thanks, Bill Robertson On Mon, Aug 22, 2011 at 2:57 PM, Jeremiah Peschka < jeremiah.pesc...@gmail.com> wrote: > You can MR across whatever kind of data you'd like. JSON is typically used > because it's very easy to show people how to query JSON and the structure > makes sense to many programmers. > > To MR across anything else, you'll want a library that will translate your > protocol buffers encoded data into objects that can be parsed in either > JavaScript or Erlang. That is to say that you'll need a > Serialization/Deserialization function to translate between data at rest > (protobufs) to data that the MR program can understand. > > Since there are protocol buffer libraries for many languages, this should > be doable in either JavaScript or Erlang. I don't know of any examples, but > it shouldn't be much more difficult than Riak.mapValuesJson - provided that > you can find some easy magic to translate objects for you ;) > --- > Jeremiah Peschka - Founder, Brent Ozar PLF, LLC > Microsoft SQL Server MVP > > On Aug 22, 2011, at 11:51 AM, bill robertson wrote: > > > In order to run a map reduce query v.s. Riak, does the data need to be > stored in JSON? If this isn't a requirement, then how would I run a query > against data stored in a google protocol buffer format? Is there an example > of this somewhere? > > > > Thanks! > > ___ > > riak-users mailing list > > riak-users@lists.basho.com > > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Map Reduce Requirements
On Mon, Aug 22, 2011 at 3:27 PM, Jeremiah Peschka < jeremiah.pesc...@gmail.com> wrote: > I don't think Erlang can talk to JavaScript inside a single > phase/function/pile of source code. I could be wrong, but it seems to me > that marshaling data across the JavaScript/Erlang boundary would be hella > expensive and cause a lot of problems and, as such, probably doesn't exist. > It certainly seems sub-optimal. I wonder if it would be feasible to deploy an erlang web-service in the riak node's webmachine instance that could translate meta-data into Erlang funs and drive the map reduce operation that way. I'm not sure if I could get around having specific knowledge of the protobuf structures baked into that code, but I don't think it matters in this case. I also wonder how much 1.0 will change this picture. > Additionally, are secondary indexes meta-data? i.e. If I built some > secondary indices, these are stored in some form internal to Riak, and > therefore available for query regardless of the type of data its associated > with. Is this correct? > > Secondary indexes are a separate physical structure, or so I gather. (Rusty > could be full of lies.) They're stored separately from the initial data and > not as metadata in the object headers. So, yes, you can store whatever you > want in secondary indexes and query it however you want, provided there's an > API that supports what you're doing. > Would secondary indexes eliminate the need for key-filtering? Logically, it would seem that you could do with indexes, but do they have similar performance characteristics? (i.e. does one suck more than the other?) Thanks again, Bill Robertson ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Problems Executing Erlang Map Reduce
I have a 0.14.2 dev database set up from the riak fast track tutorial. I created an Erlang application that creates a new route for webmachine upon startup. configure_webmachine() -> webmachine_router:add_route( {["foo", bar, '*'], myapp_myresource, []}). I modified vm.args for all three instances to point to the application by adding the following lines. -pa /home/me/my_app/ebin -s my_app I am able to reach the /foo URL from all three instances of Riak (localhost:8091-93). So I assume that every instance also has access to my map-reduce code too. Also included in the application is a module with a test map-reduce query in it. I have a binary encoded Erlang term stored in the database, and I am down to simply trying to just retrieve the one entry and return 'something' from a map function. (i.e. I've been cutting things out until there is nothing left to cut out), and I am receiving errors. Here is the mapred code. Conn = riakc_pb_socket:start("127.0.0.1", 8091), A = fun(Value, Key, Arg) -> args(Value, Key, Arg) end, Result = riakc_pb_socket:mapred(Conn, [{<<"bucket">>, <<"key">>}], [{map, {qfun, A}, none, true}]), args(_Value, _Key, _Arg) -> [args]. This code bombs, and then I get the following back from webmachine in the browsers (binary goo elided). Internal Server Error The server encountered an error while processing this request: {error, {exit, {{function_clause, [{gen,call, [{ok,<0.268.0>}, '$gen_call', {req, {rpbmapredreq, <<131,108, ...,106>>, <<"application/x-erlang-binary">>}, 60100, {1061003,<0.252.0>}}, 6]}, {gen_server,call,3}, {riakc_pb_socket,mapred,5}, {myapp_mapred,do_mapred,6}, {myapp_myresource,to_html,2}, {webmachine_resource,resource_call,3}, {webmachine_resource,do,3}, {webmachine_decision_core,resource_call,1}]}, {gen_server,call, [{ok,<0.268.0>}, {req, {rpbmapredreq, <<131,108, ...,106>>, <<"application/x-erlang-binary">>}, 60100, {1061003,<0.252.0>}}, 6]}}, [{gen_server,call,3}, {riakc_pb_socket,mapred,5}, {myapp_mapred,do_mapred,6}, {myapp_myresource,to_html,2}, {webmachine_resource,resource_call,3}, {webmachine_resource,do,3}, {webmachine_decision_core,resource_call,1}, {webmachine_decision_core,decision,1}]}} sasl-error.log and erlang.log has the same error information as from the browser on the node that I made the request to. I checked the logs of one of the other nodes and found nothing. I must be doing something wrong, but I have no idea what. Any suggestions as to what to do? ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Problems Executing Erlang Map Reduce
I finally found it after looking at the fast track tutorial again. Mistake #1 -> erlang noob Conn = riakc_pb_socket:start("127.0.0.1", 8091) is wrong, it should be {ok, Conn} = ... And after that I had the port wrong, (8081-3 for the fast track dev db's). Thanks! On Thu, Sep 15, 2011 at 2:32 PM, bill robertson wrote: > I have a 0.14.2 dev database set up from the riak fast track tutorial. > > I created an Erlang application that creates a new route for webmachine > upon startup. > > configure_webmachine() -> > webmachine_router:add_route( > {["foo", bar, '*'], myapp_myresource, []}). > > I modified vm.args for all three instances to point to the application by > adding the following lines. > > -pa /home/me/my_app/ebin > -s my_app > > I am able to reach the /foo URL from all three instances of Riak > (localhost:8091-93). So I assume that every instance also has access to my > map-reduce code too. > > Also included in the application is a module with a test map-reduce query > in it. I have a binary encoded Erlang term stored in the database, and I am > down to simply trying to just retrieve the one entry and return 'something' > from a map function. (i.e. I've been cutting things out until there is > nothing left to cut out), and I am receiving errors. > > Here is the mapred code. > > Conn = riakc_pb_socket:start("127.0.0.1", 8091), > A = fun(Value, Key, Arg) -> args(Value, Key, Arg) end, > Result = riakc_pb_socket:mapred(Conn, [{<<"bucket">>, <<"key">>}], > [{map, {qfun, A}, none, true}]), > > args(_Value, _Key, _Arg) -> > [args]. > > This code bombs, and then I get the following back from webmachine in the > browsers (binary goo elided). > > Internal Server Error > The server encountered an error while processing this request: > > {error, > {exit, > {{function_clause, > [{gen,call, > [{ok,<0.268.0>}, >'$gen_call', >{req, >{rpbmapredreq, ><<131,108, ...,106>>, ><<"application/x-erlang-binary">>}, >60100, >{1061003,<0.252.0>}}, >6]}, > {gen_server,call,3}, > {riakc_pb_socket,mapred,5}, > {myapp_mapred,do_mapred,6}, > {myapp_myresource,to_html,2}, > {webmachine_resource,resource_call,3}, > {webmachine_resource,do,3}, > {webmachine_decision_core,resource_call,1}]}, > {gen_server,call, > [{ok,<0.268.0>}, > {req, > {rpbmapredreq, > <<131,108, ...,106>>, > <<"application/x-erlang-binary">>}, > 60100, > {1061003,<0.252.0>}}, > 6]}}, > [{gen_server,call,3}, > {riakc_pb_socket,mapred,5}, > {myapp_mapred,do_mapred,6}, > {myapp_myresource,to_html,2}, > {webmachine_resource,resource_call,3}, > {webmachine_resource,do,3}, > {webmachine_decision_core,resource_call,1}, > {webmachine_decision_core,decision,1}]}} > > sasl-error.log and erlang.log has the same error information as from the > browser on the node that I made the request to. I checked the logs of one of > the other nodes and found nothing. > > I must be doing something wrong, but I have no idea what. > > Any suggestions as to what to do? > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
2i API Questions
Love the new feature, I have a couple of questions though. First is the URL to PUT to, The example from the oscon presentation says to PUT to http://whatever:port/buckets/bucket_name/keys/key_name rather than the old style URL e.g. http://whatever:port/riak/bucket_name/key_name I've noticed that after doing the put to the new style URL I can still get from the old style URL, can I still PUT to the old style URL with an index? I can easily accept that doing a query on an index would involve a different URL, but having the PUT URL be potentially different from the GET URL is an inconstancy I'd like to avoid if possible. Second is the return value of get_index in the erlang pb client. It returns in the format of... [[<<"bucket">>,<<"key1">>], [<<"bucket">>,<<"key2">>], [<<"bucket">>,<<"key3">>], ... But map reduce takes keys in a different format: [{<<"bucket">>,<<"key1">>}, {<<"bucket">>,<<"key2">>}, {<<"bucket">>,<<"key3">>} ... Its not that its difficult to convert from one format to the other, but its inconsistent, and it seems like a waste of time (i.e. have to run through list of keys to convert). My first question is, am I missing something? i.e. will the first format work anyway? If not, then can get_index() be changed to generate its results as a list of bucket/key tuples instead of a list of lists in the first place? Thanks! ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com