Map Reduce Requirements

2011-08-22 Thread bill robertson
In order to run a map reduce query v.s. Riak, does the data need to be
stored in JSON? If this isn't a requirement, then how would I run a query
against data stored in a google protocol buffer format? Is there an example
of this somewhere?

Thanks!
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Fwd: Map Reduce Requirements

2011-08-22 Thread bill robertson
Forgot to reply-all...

-- Forwarded message --
From: bill robertson 
Date: Mon, Aug 22, 2011 at 3:18 PM
Subject: Re: Map Reduce Requirements
To: Jeremiah Peschka 


That makes sense.

Suppose I have a query called Q1. I would like to specify Q1 in Javascript.
Assume that I can write an Erlang function called F that will translate the
raw GPB bytes into the appropriate JSON for use by Q1. How would I hook F
into the processing of Q1?

I guess that the Javascript function be passed the GPB bytes in the reduce
phase at which point I could call my translation function and operate on the
JSON, and possibly pass on a structure containing the JSON and the GPB to
the next phase.

Does that make sense? Is it possible to invoke arbitrary Erlang functions
within Javascript like this? If so, are there examples?

Additionally, are secondary indexes meta-data?  i.e. If I built some
secondary indices, these are stored in some form internal to Riak, and
therefore available for query regardless of the type of data its associated
with. Is this correct?

Thanks,
Bill Robertson

On Mon, Aug 22, 2011 at 2:57 PM, Jeremiah Peschka <
jeremiah.pesc...@gmail.com> wrote:

> You can MR across whatever kind of data you'd like. JSON is typically used
> because it's very easy to show people how to query JSON and the structure
> makes sense to many programmers.
>
> To MR across anything else, you'll want a library that will translate your
> protocol buffers encoded data into objects that can be parsed in either
> JavaScript or Erlang. That is to say that you'll need a
> Serialization/Deserialization function to translate between data at rest
> (protobufs) to data that the MR program can understand.
>
> Since there are protocol buffer libraries for many languages, this should
> be doable in either JavaScript or Erlang. I don't know of any examples, but
> it shouldn't be much more difficult than Riak.mapValuesJson - provided that
> you can find some easy magic to translate objects for you ;)
> ---
> Jeremiah Peschka - Founder, Brent Ozar PLF, LLC
> Microsoft SQL Server MVP
>
> On Aug 22, 2011, at 11:51 AM, bill robertson wrote:
>
> > In order to run a map reduce query v.s. Riak, does the data need to be
> stored in JSON? If this isn't a requirement, then how would I run a query
> against data stored in a google protocol buffer format? Is there an example
> of this somewhere?
> >
> > Thanks!
> > ___
> > riak-users mailing list
> > riak-users@lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Map Reduce Requirements

2011-08-22 Thread bill robertson
On Mon, Aug 22, 2011 at 3:27 PM, Jeremiah Peschka <
jeremiah.pesc...@gmail.com> wrote:

> I don't think Erlang can talk to JavaScript inside a single
> phase/function/pile of source code. I could be wrong, but it seems to me
> that marshaling data across the JavaScript/Erlang boundary would be hella
> expensive and cause a lot of problems and, as such, probably doesn't exist.
>

It certainly seems sub-optimal.

I wonder if it would be feasible to deploy an erlang web-service in the riak
node's webmachine instance that could translate meta-data into Erlang funs
and drive the map reduce operation that way. I'm not sure if I could get
around having specific knowledge of the protobuf structures baked into that
code, but I don't think it matters in this case.

I also wonder how much 1.0 will change this picture.

> Additionally, are secondary indexes meta-data?  i.e. If I built some
> secondary indices, these are stored in some form internal to Riak, and
> therefore available for query regardless of the type of data its associated
> with. Is this correct?
>
> Secondary indexes are a separate physical structure, or so I gather. (Rusty
> could be full of lies.) They're stored separately from the initial data and
> not as metadata in the object headers. So, yes, you can store whatever you
> want in secondary indexes and query it however you want, provided there's an
> API that supports what you're doing.
>

Would secondary indexes eliminate the need for key-filtering? Logically, it
would seem that you could do with indexes, but do they have similar
performance characteristics?  (i.e. does one suck more than the other?)

Thanks again,
Bill Robertson
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Problems Executing Erlang Map Reduce

2011-09-15 Thread bill robertson
I have a 0.14.2 dev database set up from the riak fast track tutorial.

I created an Erlang application that creates a new route for webmachine upon
startup.

configure_webmachine() ->
webmachine_router:add_route(
  {["foo", bar, '*'], myapp_myresource, []}).

I modified vm.args for all three instances to point to the application by
adding the following lines.

-pa /home/me/my_app/ebin
-s my_app

I am able to reach the /foo URL from all three instances of Riak
(localhost:8091-93). So I assume that every instance also has access to my
map-reduce code too.

Also included in the application is a module with a test map-reduce query in
it. I have a binary encoded Erlang term stored in the database, and I am
down to simply trying to just retrieve the one entry and return 'something'
from a map function.  (i.e. I've been cutting things out until there is
nothing left to cut out), and I am receiving errors.

Here is the mapred code.

Conn = riakc_pb_socket:start("127.0.0.1", 8091),
A = fun(Value, Key, Arg) -> args(Value, Key, Arg) end,
Result = riakc_pb_socket:mapred(Conn, [{<<"bucket">>, <<"key">>}],
[{map, {qfun, A}, none, true}]),

args(_Value, _Key, _Arg) ->
[args].

This code bombs, and then I get the following back from webmachine in the
browsers (binary goo elided).

Internal Server Error
The server encountered an error while processing this request:

{error,
{exit,
{{function_clause,
 [{gen,call,
  [{ok,<0.268.0>},
   '$gen_call',
   {req,
   {rpbmapredreq,
   <<131,108, ...,106>>,
   <<"application/x-erlang-binary">>},
   60100,
   {1061003,<0.252.0>}},
   6]},
  {gen_server,call,3},
  {riakc_pb_socket,mapred,5},
  {myapp_mapred,do_mapred,6},
  {myapp_myresource,to_html,2},
  {webmachine_resource,resource_call,3},
  {webmachine_resource,do,3},
  {webmachine_decision_core,resource_call,1}]},
 {gen_server,call,
 [{ok,<0.268.0>},
  {req,
  {rpbmapredreq,
  <<131,108, ...,106>>,
  <<"application/x-erlang-binary">>},
  60100,
  {1061003,<0.252.0>}},
  6]}},
[{gen_server,call,3},
 {riakc_pb_socket,mapred,5},
 {myapp_mapred,do_mapred,6},
 {myapp_myresource,to_html,2},
 {webmachine_resource,resource_call,3},
 {webmachine_resource,do,3},
 {webmachine_decision_core,resource_call,1},
 {webmachine_decision_core,decision,1}]}}

sasl-error.log and erlang.log has the same error information as from the
browser on the node that I made the request to. I checked the logs of one of
the other nodes and found nothing.

I must be doing something wrong, but I have no idea what.

Any suggestions as to what to do?
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Problems Executing Erlang Map Reduce

2011-09-15 Thread bill robertson
I finally found it after looking at the fast track tutorial again.

Mistake #1 -> erlang noob

Conn = riakc_pb_socket:start("127.0.0.1", 8091)

is wrong, it should be

{ok, Conn} = ...

And after that I had the port wrong, (8081-3 for the fast track dev db's).

Thanks!


On Thu, Sep 15, 2011 at 2:32 PM, bill robertson
wrote:

> I have a 0.14.2 dev database set up from the riak fast track tutorial.
>
> I created an Erlang application that creates a new route for webmachine
> upon startup.
>
> configure_webmachine() ->
> webmachine_router:add_route(
>   {["foo", bar, '*'], myapp_myresource, []}).
>
> I modified vm.args for all three instances to point to the application by
> adding the following lines.
>
> -pa /home/me/my_app/ebin
> -s my_app
>
> I am able to reach the /foo URL from all three instances of Riak
> (localhost:8091-93). So I assume that every instance also has access to my
> map-reduce code too.
>
> Also included in the application is a module with a test map-reduce query
> in it. I have a binary encoded Erlang term stored in the database, and I am
> down to simply trying to just retrieve the one entry and return 'something'
> from a map function.  (i.e. I've been cutting things out until there is
> nothing left to cut out), and I am receiving errors.
>
> Here is the mapred code.
>
> Conn = riakc_pb_socket:start("127.0.0.1", 8091),
> A = fun(Value, Key, Arg) -> args(Value, Key, Arg) end,
> Result = riakc_pb_socket:mapred(Conn, [{<<"bucket">>, <<"key">>}],
> [{map, {qfun, A}, none, true}]),
>
> args(_Value, _Key, _Arg) ->
> [args].
>
> This code bombs, and then I get the following back from webmachine in the
> browsers (binary goo elided).
>
> Internal Server Error
> The server encountered an error while processing this request:
>
> {error,
> {exit,
> {{function_clause,
>  [{gen,call,
>   [{ok,<0.268.0>},
>'$gen_call',
>{req,
>{rpbmapredreq,
><<131,108, ...,106>>,
><<"application/x-erlang-binary">>},
>60100,
>{1061003,<0.252.0>}},
>6]},
>   {gen_server,call,3},
>   {riakc_pb_socket,mapred,5},
>   {myapp_mapred,do_mapred,6},
>   {myapp_myresource,to_html,2},
>   {webmachine_resource,resource_call,3},
>   {webmachine_resource,do,3},
>   {webmachine_decision_core,resource_call,1}]},
>  {gen_server,call,
>  [{ok,<0.268.0>},
>   {req,
>   {rpbmapredreq,
>   <<131,108, ...,106>>,
>   <<"application/x-erlang-binary">>},
>   60100,
>   {1061003,<0.252.0>}},
>   6]}},
> [{gen_server,call,3},
>  {riakc_pb_socket,mapred,5},
>  {myapp_mapred,do_mapred,6},
>  {myapp_myresource,to_html,2},
>  {webmachine_resource,resource_call,3},
>  {webmachine_resource,do,3},
>  {webmachine_decision_core,resource_call,1},
>  {webmachine_decision_core,decision,1}]}}
>
> sasl-error.log and erlang.log has the same error information as from the
> browser on the node that I made the request to. I checked the logs of one of
> the other nodes and found nothing.
>
> I must be doing something wrong, but I have no idea what.
>
> Any suggestions as to what to do?
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


2i API Questions

2011-09-29 Thread bill robertson
Love the new feature, I have a couple of questions though.

First is the URL to PUT to, The example from the oscon presentation says to
PUT to

http://whatever:port/buckets/bucket_name/keys/key_name

rather than the old style URL e.g.

http://whatever:port/riak/bucket_name/key_name

I've noticed that after doing the put to the new style URL I can still get
from the old style URL, can I still PUT to the old style URL with an index?
I can easily accept that doing a query on an index would involve a different
URL, but having the PUT URL be potentially different from the GET URL is an
inconstancy I'd like to avoid if possible.

Second is the return value of get_index in the erlang pb client.  It returns
in the format of...

[[<<"bucket">>,<<"key1">>],
 [<<"bucket">>,<<"key2">>],
 [<<"bucket">>,<<"key3">>],
...

But map reduce takes keys in a different format:

[{<<"bucket">>,<<"key1">>},
 {<<"bucket">>,<<"key2">>},
 {<<"bucket">>,<<"key3">>}
...

Its not that its difficult to convert from one format to the other, but its
inconsistent, and it seems like a waste of time (i.e. have to run through
list of keys to convert). My first question is, am I missing something?
 i.e. will the first format work anyway?

If not, then can get_index() be changed to generate its results as a list of
bucket/key tuples instead of a list of lists in the first place?

Thanks!
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com