Erlang Client: get_update_metadata vs get_metadata

2012-06-12 Thread Andrew Berman
Can someone explain the difference between the get_update_metadata and
get_metadata functions in the Erlang PB Client for Riak?  It's very
confusing...

Thanks,

Andrew
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Erlang Client: get_update_metadata vs get_metadata

2012-06-13 Thread Andrew Berman
Thanks guys,

I'm more inclined to have an API like get_original_metadata and
get_metadata.  The get_metadata in this case always returns whatever
metadata is set on the object, new or original.  In the current API, if
calling get_update_metadata will return the original metadata if there are
no changes, then I kinda fail to see a use case for the get_metadata call.
 Anyone?

Thanks again!

Andrew

On Tue, Jun 12, 2012 at 2:37 PM, Michael Radford  wrote:

> Reid,
>
> I do understand why update_metadata exists. I guess what I'm
> suggesting is a better default behavior, especially for users who
> don't explicitly set any metadata values. (Or even if they do, for
> when all the metadatas are equivalent.)
>
> I.e., something like this for riakc_obj:get_update_metadata:
>
> get_update_metadata(#riakc_obj{updatemetadata=UM}=Object) ->
>case UM of
>undefined ->
>try
>get_metadata(Object)
>catch
>throw:no_metadata ->
>dict:new();
>throw:siblings ->
>default_resolve_metadatas(get_metadatas(Object))
>end;
>UM ->
>UM
>end.
>
> default_resolve_metadatas(Ms = [M | _]) ->
>UniqueWritten = lists:usort([ [KV || KV = {K, _V} <- dict:to_list(M),
> K =/= ?MD_LASTMOD,
> K =/= ?MD_INDEX
>  || M <- Ms ]),
>case UniqueWritten of
>  [_]-> M;
>  [_, _ | _] -> throw(siblings)
>end.
>
> Mike
>
> On Tue, Jun 12, 2012 at 1:18 PM, Reid Draper  wrote:
> >
> > On Jun 12, 2012, at 2:56 PM, Michael Radford wrote:
> >
> >> get_metadata returns the metadata that was read from riak. But if
> >> allow_mult is true, and there is more than one sibling, then
> >> get_metadata throws the exception 'siblings'. You have to call
> >> get_metadatas to get a list with metadata for each sibling in that
> >> case.
> >>
> >> get_update_metadata returns the metadata that is to be written for the
> >> object (if you were to call riakc_pb_socket:put at that point). The
> >> update metadata is either a single value set explicitly with
> >> riakc_obj:update_metadata, or if none was set, and there is only one
> >> sibling, then the default is the value of get_metadata.
> >>
> >> A related question: if I'm not using any user-specified metadata at
> >> all, but I do have allow_mult turned on, then how do I choose which
> >> metadata to write back to riak after resolving the conflict? Or could
> >> I just call update_metadata with an empty dict in that case?
> > I'd recommend calling update_metadata with an empty dict. Be sure
> > to set the content_type as well.
> >>
> >> Right now, I have some conflict resolution code that uses the same
> >> default strategy as mochimedia's statebox_riak library, which
> >> arbitrarily chooses the first metadata. But this seems less than
> >> ideal: everything in the metadata is coming from riak, and some of it
> >> (e.g., last-modified timestamps) must be ignored when doing the
> >> update. So it seems like riak should be able to resolve the "metadata
> >> conflict" on its own: just prune all the metadata keys that aren't
> >> actually written, and then if the resulting pruned metadatas are
> >> identical, then there's no conflict. Or, if there is some reason why
> >> the user should prefer one metadata over another, then the client
> >> library should give the user some way to decide.
> > There are definitely cases where the user wants to choose one metadata
> > over another, or perhaps more commonly, "merge" them together, according
> > to some conflict resolution semantics. The client provides
> `update_metadata`
> > for this reason. `select_sibling/2` can be used to choose a particular
> {Metadata, Value}
> > pair as well.
> >>
> >> Mike
> >>
> >> On Tue, Jun 12, 2012 at 11:25 AM, Andrew Berman 
> wrote:
> >>> Can someone explain the difference between the get_update_metadata and
> >>> get_metadata functions in the Erlang PB Client for Riak?  It's very
> >>> confusing...
> >>>
> >>> Thanks,
> >>>
> >>> Andrew
> >>>
> >>> ___
> >>> riak-users mailing list
> >>> riak-users@lists.basho.com
> >>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >>>
> >>
> >> ___
> >> riak-users mailing list
> >> riak-users@lists.basho.com
> >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: riak-erlang-client search changes

2012-08-02 Thread Andrew Berman
The more fields the better.   I like this change.

Andrew
On Aug 2, 2012 8:17 AM, "Dave Parfitt"  wrote:

> Hello -
>
> We're considering some changes to the Riak Search functionality in 
> riak-erlang-client for the upcoming Riak 1.2 release. The current behavior of 
> the riakc_pb_socket:search/* functions return a list in the form: [[Index, 
> Id],[Index2,Id2],...]
>
> With the new Riak Search protobuffs messages, we have the ability to also 
> return fields from the search doc in the results (as additional values in the 
> tuple). Also, it's possible to return the search results "max score", and 
> "number found". Does anyone have any objections to returning additional 
> fields? To maintain semi-compatible behavior, it's possible to use the fl 
> (field limit) search option to just return the id.
>
> Current behavior:
> riakc_pb_socket:search(Pid, <<"phrases_custom">>, <<"phrase:fox">>).
> {ok,[[<<"phrases_custom">>,<<"5">>],
>  [<<"phrases_custom">>,<<"1">>]]}
>
> Proposed behavior:
> riakc_pb_socket:search(Pid, <<"phrases_custom">>, <<"phrase:fox">>).
> {ok,[{<<"phrases_custom">>,
> [{<<"id">>,<<"1">>},
>  {<<"phrase">>,<<"The quick brown fox jumps over the lazy dog">>}],
>  {<<"phrases_custom">>,
> [{<<"id">>,<<"5">>},
> {<<"phrase">>,<<"The quick brown fox jumps over the lazy dog">>}],
> 0.0,2}
> %% Note the last two fields of the result are Max Score and Number Found.
>
> Semi-compatible behavior by specifying the fl (with the exception of max 
> score and number found):
> riakc_pb_socket:search(Pid, <<"phrases_custom">>, <<"phrase:fox">>, 
> [{fl,[<<"id">>]}], 5000, 5000).
> {ok,[{<<"phrases_custom">>,[{<<"id">>,<<"1">>}]},
>  {<<"phrases_custom">>,[{<<"id">>,<<"5">>}]},
> 0.0,2}
>
> Cheers -
> Dave
>
>
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak won't start -- RHEL6

2012-08-25 Thread Andrew Berman
What do your config files look like? Do you have proper permissions on the
Riak directory?
On Aug 25, 2012 10:10 AM, "Vladimir Kupcov" 
wrote:

> Hi,
>
> I installed Riak from .rpm package on RHEL6 on virtual machine.
> Unfortunately, I can't get Riak to start. Here is the console output:
>
>
>
> [idcuser@vhost0536 ~]$ riak console
> Attempting to restart script through sudo -H -u riak
> Exec: /usr/lib64/riak/erts-5.9.1/bin/erlexec -boot
> /usr/lib64/riak/releases/1.2.0/riak -embedded -config
> /etc/riak/app.config -pa /usr/lib64/riak/basho-patches
> -args_file /etc/riak/vm.args -- console
> Root: /usr/lib64/riak
> {error_logger,{{2012,8,25},{4,0,21}},"Protocol: ~p: register error:
> ~p~n",["inet_tcp",{{badmatch,{error,etimedout}},[{inet_tcp_dist,listen,1,[{file,"inet_tcp_dist.erl"},{line,70}]},{net_kernel,start_protos,4,[{file,"net_kernel.erl"},{line,1314}]},{net_kernel,start_protos,3,[{file,"net_kernel.erl"},{line,1307}]},{net_kernel,init_node,2,[{file,"net_kernel.erl"},{line,1197}]},{net_kernel,init,1,[{file,"net_kernel.erl"},{line,357}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,304}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]}]}
>
> {error_logger,{{2012,8,25},{4,0,21}},crash_report,[[{initial_call,{net_kernel,init,['Argument__1']}},{pid,<0.20.0>},{registered_name,[]},{error_info,{exit,{error,badarg},[{gen_server,init_it,6,[{file,"gen_server.erl"},{line,320}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]}},{ancestors,[net_sup,kernel_sup,<0.10.0>]},{messages,[]},{links,[#Port<0.194>,<0.17.0>]},{dictionary,[{longnames,true}]},{trap_exit,true},{status,running},{heap_size,610},{stack_size,24},{reductions,507}],[]]}
>
> {error_logger,{{2012,8,25},{4,0,21}},supervisor_report,[{supervisor,{local,net_sup}},{errorContext,start_error},{reason,{'EXIT',nodistribution}},{offender,[{pid,undefined},{name,net_kernel},{mfargs,{net_kernel,start_link,[['
> riak@127.0.0.1
> ',longnames]]}},{restart_type,permanent},{shutdown,2000},{child_type,worker}]}]}
>
> {error_logger,{{2012,8,25},{4,0,21}},supervisor_report,[{supervisor,{local,kernel_sup}},{errorContext,start_error},{reason,shutdown},{offender,[{pid,undefined},{name,net_sup},{mfargs,{erl_distribution,start_link,[]}},{restart_type,permanent},{shutdown,infinity},{child_type,supervisor}]}]}
>
> {error_logger,{{2012,8,25},{4,0,21}},std_info,[{application,kernel},{exited,{shutdown,{kernel,start,[normal,[]]}}},{type,permanent}]}
> {"Kernel pid
> terminated",application_controller,"{application_start_failure,kernel,{shutdown,{kernel,start,[normal,[]]}}}"}
>
> Crash dump was written to: /var/log/riak/erl_crash.dump
> Kernel pid terminated (application_controller)
> ({application_start_failure,kernel,{shutdown,{kernel,start,[normal,[]]}}})
>
>
>
> Any suggestions?
>
> Thank you,
> Vlad
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Riak and Zookeeper

2013-03-22 Thread Andrew Berman
Hello,

I'm wondering if anyone has explored the idea of using Zookeeper in front
of Riak to handle  locking.  My thought is that a client goes to Zookeeper
to get a lock on a key before updating.  Any other client that wishes to
update the same key must check for the existence of a lock.  If it exists,
an error is thrown, if not, then it proceeds.  Once the client is finished
with the key, it releases the lock.

--Andrew
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak and Zookeeper

2013-03-25 Thread Andrew Berman
Thanks for the info guys!


On Fri, Mar 22, 2013 at 7:42 PM, Andrew Stone  wrote:

> You may also want to have a look at this post by Aphyr. There are a LOT of
> caveats when trying to do this sort of thing.
>
> http://aphyr.com/posts/254-burn-the-library
>
> -Andrew
>
>
> On Fri, Mar 22, 2013 at 9:02 PM, Sean Cribbs  wrote:
>
>> Datomic does something similar -- except that instead of updating keys
>> in-place, it only adds new values to Riak and advances the pointer(s)
>> to the current state in ZK.
>> http://www.infoq.com/presentations/Deconstructing-Database
>>
>> On Fri, Mar 22, 2013 at 7:32 PM, Andrew Berman  wrote:
>> > Hello,
>> >
>> > I'm wondering if anyone has explored the idea of using Zookeeper in
>> front of
>> > Riak to handle  locking.  My thought is that a client goes to Zookeeper
>> to
>> > get a lock on a key before updating.  Any other client that wishes to
>> update
>> > the same key must check for the existence of a lock.  If it exists, an
>> error
>> > is thrown, if not, then it proceeds.  Once the client is finished with
>> the
>> > key, it releases the lock.
>> >
>> > --Andrew
>> >
>> > ___
>> > riak-users mailing list
>> > riak-users@lists.basho.com
>> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> >
>>
>>
>>
>> --
>> Sean Cribbs 
>> Software Engineer
>> Basho Technologies, Inc.
>> http://basho.com/
>>
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Urgent help with a down node.

2013-07-08 Thread Andrew Berman
Bryan,

What version of Erlang?  You should check this out:
https://github.com/basho/riak_kv/issues/411

BTW - Google is your friend, which is how I found the above issue :)

--Andrew


On Sun, Jul 7, 2013 at 3:01 PM, Bryan Hughes  wrote:

>  Hi Mark,
>
> DOH - sorry for the lack of detail.  Didnt have enough coffee this morning.
>
> OS: CentOS release 6.3 (Final)
> Riak:   Riak 1.2.1
>
> Hadnt had a chance to upgrade to 1.3 yet.
>
> Got the node back up - but not entirely sure why which is a little
> concerning.  Been verifying the data, and everything looks intact.  When I
> try to run riak-admin status, I get the following (note I am not entirely
> sure this was the case when we first set the node up):
>
> $ riak-admin status
> Status failed, see log for details
>
> The logs shows:
>
> 2013-07-07 14:55:03.858 [error] <0.12982.0>@riak_kv_console:status:173
> Status failed error:function_clause
> 2013-07-07 14:55:03.858 [error] emulator Error in process <0.12983.0> on
> node 'riak@127.0.0.1' with exit value:
> {badarg,[{erlang,system_info,[global_heaps_size],[]},{riak_kv_stat,system_stats,0,[{file,"src/riak_kv_stat.erl"},{line,421}]},{riak_kv_stat,produce_stats,0,[{file,"src/riak_kv_stat.erl"},{line,320}]},{timer,tc,3,[{file,"timer...
>
>
> This is on a dev cluster with an out-of-the box configuration using
> bitcask.
>
> Thanks!
>
> Bryan
>
>
> On 7/7/13 2:51 PM, Mark Phillips wrote:
>
> Hi Bryan,
>
>  I remember seeing something similar on the list a while ago. I'll dig
> through the archives (Riak.markmail.org) if I have a few minutes later
> tonight.
>
>  In the mean time, what version of Riak is this? And what OS?
>
>  Mark
>
> On Sunday, July 7, 2013, Bryan Hughes wrote:
>
>>  Anyone familiar with this error message?
>>
>> 2013-07-07 12:51:42 =ERROR REPORT
>> Hintfile
>> './data/bitcask/22835963083295358096932575511191922182123945984/3.bitcask.hint'
>> contains pointer 16555635 566 that is greater than total data size 16556032
>> 2013-07-07 12:51:45 =ERROR REPORT
>> Hintfile
>> './data/bitcask/114179815416476790484662877555959610910619729920/3.bitcask.hint'
>> contains pointer 17817310 567 that is greater than total data size
>> 17817600
>> 2013-07-07 12:51:46 =ERROR REPORT
>> Hintfile
>> './data/bitcask/159851741583067506678528028578343455274867621888/3.bitcask.hint'
>> contains pointer 7573448 567 that is greater than total data size 7573504
>> 2013-07-07 12:51:46 =ERROR REPORT
>> Bad datafile entry 1:
>> {ok,<<131,104,2,109,0,0,0,9,65,80,73,67,79,85,78,84,83,109,0,0,0,33,55,56,54,57,52,49,56,49,94,103,111,115,101,114,118,105,99,101,95,99>>}
>> 2013-07-07 12:51:56 =ERROR REPORT
>> Hintfile
>> './data/bitcask/730750818665451459101842416358141509827966271488/3.bitcask.hint'
>> contains pointer 13229833 581 that is greater than total data size 13230080
>> 2013-07-07 12:52:05 =ERROR REPORT
>> Hintfile
>> './data/bitcask/1187470080331358621040493926581979953470445191168/3.bitcask.hint'
>> contains pointer 23465420 578 that is greater than total data size 23465984
>> 2013-07-07 12:52:06 =ERROR REPORT
>> Hintfile
>> './data/bitcask/1210306043414653979137426502093171875652569137152/3.bitcask.hint'
>> contains pointer 27733824 578 that is greater than total data size 27734016
>> 2013-07-07 12:52:07 =ERROR REPORT
>> Hintfile
>> './data/bitcask/1233142006497949337234359077604363797834693083136/3.bitcask.hint'
>> contains pointer 15014008 578 that is greater than total data size
>> 15014586
>> 2013-07-07 12:54:43 =ERROR REPORT
>> Bad datafile entry, discarding(383/566 bytes)
>> 2013-07-07 12:54:45 =ERROR REPORT
>> Bad datafile entry, discarding(276/567 bytes)
>> 2013-07-07 12:54:46 =ERROR REPORT
>> Bad datafile entry, discarding(42/567 bytes)
>> 2013-07-07 12:54:57 =ERROR REPORT
>> Bad datafile entry, discarding(233/581 bytes)
>> 2013-07-07 12:55:06 =ERROR REPORT
>> Bad datafile entry, discarding(550/578 bytes)
>> 2013-07-07 12:55:07 =ERROR REPORT
>> Bad datafile entry, discarding(178/578 bytes)
>> 2013-07-07 12:56:00 =ERROR REPORT
>> Error in process <0.1536.0> on node 'riak@127.0.0.1' with exit value:
>> {badarg,[{erlang,system_info,[global_heaps_size],[]},{riak_kv_stat,system_stats,0,[{file,"src/riak_kv_stat.erl"},{line,421}]},{riak_kv_stat,produce_stats,0,[{file,"src/riak_kv_stat.erl"},{line,320}]},{timer,tc,3,[{file,"timer...
>>
>> --
>>
>> Bryan Hughes
>> *Go Factory*
>> http://www.go-factory.net
>>
>> *"Internet Class, Enterprise Grade"*
>>
>>
>>
> --
>
> Bryan Hughes
> CTO and Founder / *Go Factory*
> (415) 515-7916
>
> http://www.go-factory.net
>
> *"Internet Class, Enterprise Grade"*
>
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Links vs Key Filters for Performance

2011-05-05 Thread Andrew Berman
I was curious if anyone has any thoughts on what is more performant, links
or key filters in terms of secondary links.  For example:

I want to be able to look up a user by id and email:

*Link implementation:*

Two buckets: user and user_email, where id is the key of user and email is
the key of user_email.  User_email contains no data but simply has a link
pointing back to the proper user.

*Key Filter:*

One bucket: user, where id_email is the key of the bucket.  Lookups would
use a key filter tokenizing the id and then looking up the id or email based
on the proper token.

Obviously both work, but I'm curious what the implications are from a
performance standpoint.

Thanks,

Andrew
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Links vs Key Filters for Performance

2011-05-05 Thread Andrew Berman
Ah, that makes sense.  So is it the case that using the link implementation
will always be faster?  Or are there cases where it makes more sense to use
a key filter?

Thanks!

--Andrew

On Thu, May 5, 2011 at 3:44 PM, Aphyr  wrote:

> The key filter still has to walk the entire keyspace, which will make
> fetches an O(n) operation as opposed to O(1).
>
> --Kyle
>
>
> On 05/05/2011 03:35 PM, Andrew Berman wrote:
>
>> I was curious if anyone has any thoughts on what is more performant,
>> links or key filters in terms of secondary links.  For example:
>>
>> I want to be able to look up a user by id and email:
>>
>> *Link implementation:*
>>
>> Two buckets: user and user_email, where id is the key of user and email
>> is the key of user_email.  User_email contains no data but simply has a
>> link pointing back to the proper user.
>>
>> *Key Filter:*
>>
>> One bucket: user, where id_email is the key of the bucket.  Lookups
>> would use a key filter tokenizing the id and then looking up the id or
>> email based on the proper token.
>>
>> Obviously both work, but I'm curious what the implications are from a
>> performance standpoint.
>>
>> Thanks,
>>
>> Andrew
>>
>>
>>
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Links vs Key Filters for Performance

2011-05-05 Thread Andrew Berman
Ok, cool.  Thanks for the example!

On Thu, May 5, 2011 at 3:51 PM, Aphyr  wrote:

> I suppose if you had a really small number of keys in Riak it might be
> faster, but you're almost certainly better off maintaining a second object
> and making the lookup constant time. Here's an example:
>
> https://github.com/aphyr/risky/blob/master/lib/risky/indexes.rb
>
> --Kyle
>
>
> On 05/05/2011 03:49 PM, Andrew Berman wrote:
>
>> Ah, that makes sense.  So is it the case that using the link
>> implementation will always be faster?  Or are there cases where it makes
>> more sense to use a key filter?
>>
>> Thanks!
>>
>> --Andrew
>>
>> On Thu, May 5, 2011 at 3:44 PM, Aphyr > <mailto:ap...@aphyr.com>> wrote:
>>
>>The key filter still has to walk the entire keyspace, which will
>>make fetches an O(n) operation as opposed to O(1).
>>
>>--Kyle
>>
>>
>>On 05/05/2011 03:35 PM, Andrew Berman wrote:
>>
>>I was curious if anyone has any thoughts on what is more
>> performant,
>>links or key filters in terms of secondary links.  For example:
>>
>>I want to be able to look up a user by id and email:
>>
>>*Link implementation:*
>>
>>Two buckets: user and user_email, where id is the key of user
>>and email
>>is the key of user_email.  User_email contains no data but
>>simply has a
>>link pointing back to the proper user.
>>
>>*Key Filter:*
>>
>>One bucket: user, where id_email is the key of the bucket.  Lookups
>>would use a key filter tokenizing the id and then looking up the
>>id or
>>email based on the proper token.
>>
>>Obviously both work, but I'm curious what the implications are
>>from a
>>performance standpoint.
>>
>>Thanks,
>>
>>Andrew
>>
>>
>>
>>___
>>riak-users mailing list
>>riak-users@lists.basho.com <mailto:riak-users@lists.basho.com>
>>
>>http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Links vs Key Filters for Performance

2011-05-05 Thread Andrew Berman
Yes, but this would be a bucket where each key would only ever have one link
pointing back to the original user.

--Andrew

On Thu, May 5, 2011 at 3:52 PM, Jason J. W. Williams <
jasonjwwilli...@gmail.com> wrote:

> On Thu, May 5, 2011 at 4:49 PM, Andrew Berman  wrote:
> > Ah, that makes sense.  So is it the case that using the link
> implementation
> > will always be faster?  Or are there cases where it makes more sense to
> use
> > a key filter?
>
> There's a practical limit to how many links you can walk before
> performance becomes unacceptable.
>
> -J
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Link walking

2011-05-06 Thread Andrew Berman
I don't totally understand what you're doing in your code, but it looks like
you have the map phase before the link phase which doesn't make sense since
you want the data from the link phase passed on to the map phase, not the
other way around.

On Fri, May 6, 2011 at 9:37 AM, Joshua Hanson wrote:

> In my map/reduce query I would like to keep the input data
> and also data from the link-walking phase.
>
> Here is some sample code:
>
> #insert message
> db.save('messages', 'josh-123', 'secret message', function(err, message) {
> #insert people object with link to message we just inserted
>  db.save('people', 'josh', {'profession': 'developer'},
>  { links: [{ bucket: 'messages', key: 'josh-123', 'tag': 'message' }]},
>  function(err, data) {
>  db.add([['people', 'josh']])
> .map({ 'source': 'Riak.mapValuesJson', 'keep': true})
>  .link({ 'bucket': 'messages'})
> .run(function(err, data) {
>  if (err) return console.log(err);
>  console.log(data);
> })
>  }
> )
> })
>
> So, if I remove the 'link' phase I get the correct object from the map
> phase
> but if I instead remove the 'map' phase, I get the correct object from link
> phase.
>
> However, having both together does not work. Is it possible to get at both
> the original data and the data from link-walking in the same query?
> _
> Joshua Hanson
> e: joshua.b.han...@gmail.com
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Link walking

2011-05-06 Thread Andrew Berman
The results from one phase get passed on to the next phase, so if you want
both data sets you need to run two different map-reduce queries.  If you
want the results from the link phase you need to run a link phase in
addition to a map phase (link first and then you can do the map phase with
Riak.mapValuesJson, which will give you the object that the link is pointing
to).

On Fri, May 6, 2011 at 12:14 PM, Joshua Hanson wrote:

> The initial map phase is to grab the values from the input phase (.add()).
> I want these values as well as the ones exposed from .link() but not sure
> how to express it.
>
> _
> Joshua Hanson
> e: joshua.b.han...@gmail.com
>
>
> On Fri, May 6, 2011 at 3:01 PM, Andrew Berman  wrote:
>
>> I don't totally understand what you're doing in your code, but it looks
>> like you have the map phase before the link phase which doesn't make sense
>> since you want the data from the link phase passed on to the map phase, not
>> the other way around.
>>
>> On Fri, May 6, 2011 at 9:37 AM, Joshua Hanson 
>> wrote:
>>
>>> In my map/reduce query I would like to keep the input data
>>> and also data from the link-walking phase.
>>>
>>> Here is some sample code:
>>>
>>> #insert message
>>> db.save('messages', 'josh-123', 'secret message', function(err, message)
>>> {
>>> #insert people object with link to message we just inserted
>>>  db.save('people', 'josh', {'profession': 'developer'},
>>>  { links: [{ bucket: 'messages', key: 'josh-123', 'tag': 'message' }]},
>>>  function(err, data) {
>>>  db.add([['people', 'josh']])
>>> .map({ 'source': 'Riak.mapValuesJson', 'keep': true})
>>>  .link({ 'bucket': 'messages'})
>>> .run(function(err, data) {
>>>  if (err) return console.log(err);
>>>  console.log(data);
>>> })
>>>  }
>>> )
>>> })
>>>
>>> So, if I remove the 'link' phase I get the correct object from the map
>>> phase
>>> but if I instead remove the 'map' phase, I get the correct object from
>>> link phase.
>>>
>>> However, having both together does not work. Is it possible to get at
>>> both the original data and the data from link-walking in the same query?
>>> _
>>> Joshua Hanson
>>> e: joshua.b.han...@gmail.com
>>>
>>> ___
>>> riak-users mailing list
>>> riak-users@lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>
>>>
>>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Error when trying to use a javascript custom extractor in Riaksearch

2011-05-21 Thread Andrew Berman
I'll preface this by saying I've never used this feature

rs_extractfun should be set to one of the values defined in the Other
Encodings section (
http://wiki.basho.com/Riak-Search---Indexing-and-Querying-Riak-KV-Data.html).
In your case, {jsanon, "function(a,b){return{\"user\":\"gpascale\",
\"name\":\"greg\"};}"}

Hope that helps,

Andrew

On Sat, May 21, 2011 at 7:48 PM, Greg Pascale  wrote:

> I've been banging my head against the wall trying to get a javascript
> custom extractor working. Here is the simplest example I could come up with
> to reproduce the error.
>
> *curl -v -X PUT -H "Content-Type: application/json"
> http://localhost:8098/riak/test -d @data*
>
> where *@data* is a file that looks like
>
> *{"props":*
> * {"rs_extractfun":*
> *  {"language" : "javascript", *
> *   "source" : "function(a,b){return{\"user\":\"gpascale\",
> \"name\":\"greg\"};}"*
> *  }*
> * }*
> *}*
> *
> *
> This completes successfully, and I can verify it by looking at the
> properties of the "test" bucket.
>
> *{"props":{"allow_mult":true,"basic_quorum":true,"big_vclock":50,"chash_keyfun":{"mod":"riak_core_util","fun":"chash_std_keyfun"},"dw":"quorum","last_write_wins":false,"linkfun":{"mod":"riak_kv_wm_link_walker","fun":"mapreduce_linkfun"},"n_val":3,"name":"test","notfound_ok":false,"old_vclock":86400,"postcommit":[],"pr":0,"precommit":[{"mod":"riak_search_kv_hook","fun":"precommit"}],"pw":0,"r":"quorum","rs_extractfun":{"language":"javascript","source":"function(a,b){return{\"user\":\"gpascale\",
> \"name\":\"greg\"};}"},"rw":"quorum","small_vclock":10,"w":"quorum","young_vclock":20}}
> *
>
> However, when I try to insert something into the bucket, I get an error
>
> *curl -X PUT http://localhost:8098/riak/test/test1 -d "Hello, world!"*
>
> *{error,badarg,*
> *[{erlang,iolist_to_binary,*
> * [{hook_crashed,*
> *  {riak_search_kv_hook,precommit,exit,*
> *  {noproc,*
> *  {gen_server,call,*
> *  [riak_search_js_extract,reserve_vm,*
> *   infinity]]},*
> * {wrq,append_to_response_body,2},*
> * {riak_kv_wm_raw,accept_doc_body,2},*
> * {webmachine_resource,resource_call,3},*
> * {webmachine_resource,do,3},*
> * {webmachine_decision_core,resource_call,1},*
> * {webmachine_decision_core,accept_helper,0},*
> * {webmachine_decision_core,decision,1}]}}*
> *
> *
> It doesn't matter if the thing I insert is a string, as above, or real json
> object that matches my schema - the error is the same. Any ideas what might
> be going on here?
>
> Thanks,
> -Greg
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Problems starting riak

2011-06-04 Thread Andrew Berman
Have you tried it with just a default Erlang build (just a ./configure)?

On Sat, Jun 4, 2011 at 1:50 PM, Alvaro Videla  wrote:

>
> Hi,
>
> @roidrage told me to use an older version of Erlang. Problem is another
> library I want to use only compiles with latest Erlang.
>
> On Jun 4, 2011, at 10:48 PM, Jason J. W. Williams wrote:
>
> > Have you tried with Erlang R14B02?
> >
> > Sent via iPhone
> >
> > Is your email Premiere?
> >
> > On Jun 4, 2011, at 5:25, Alvaro Videla  wrote:
> >
> >> Hi,
> >>
> >> I'm trying to build riak using the latest Erlang release built with
> these options:
> >>
> >> ./configure --enable-smp-support --enable-darwin-64bit
> --enable-kernel-poll
> >>
> >> I've got riak using: git clone git://github.com/basho/riak.git
> >>
> >> After I did *make rel* I tried bin/riak console
> >>
> >> And I got the following errors:
> >>
> >> The on_load function for module bitcask_nifs returned {error,
> >>  {bad_lib,
> >>   "Library version
> (1.0) not compatible (with 2.2)."}}
> >>
> >> And:
> >>
> >> =INFO REPORT 4-Jun-2011::14:20:15 ===
> >>   alarm_handler: {clear,{disk_almost_full,"/"}}
> >> {"Kernel pid
> terminated",application_controller,"{application_start_failure,riak_kv,{shutdown,{riak_kv_app,start,[normal,[]]}}}"}
> >>
> >> If I run df -h it shows that I have available 32GB on my HD.
> >>
> >> head erl_crash.dump
> >>
> >> =erl_crash_dump:0.1
> >> Sat Jun  4 14:20:16 2011
> >> Slogan: Kernel pid terminated (application_controller)
> ({application_start_failure,riak_kv,{shutdown,{riak_kv_app,start,[normal,[]]}}})
> >> System version: Erlang R14B03 (erts-5.8.4) [source] [64-bit] [smp:2:2]
> [rq:2] [async-threads:64] [hipe] [kernel-poll:true]
> >>
> >> Here's the full output: https://gist.github.com/1007857
> >>
> >> Any help or hints?
> >>
> >> Cheers,
> >>
> >> Alvaro
> >> ___
> >> riak-users mailing list
> >> riak-users@lists.basho.com
> >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
> Sent form my Nokia 1100
>
>
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak Recap as a Blog (?)

2011-06-06 Thread Andrew Berman
+1 and I love the idea of a RSS feed also.

On Sun, Jun 5, 2011 at 11:51 PM, Mark Phillips  wrote:

> Hey All -
>
> Quick question: how would you feel if we turned the Riak Recap into a blog?
>
> I've spoken with various people in various channels about how to best
> deliver the Recap, and while it's clear that it's a valuable tool for
> the community, I'm not sure the Mailing List is still the best vehicle
> through which to publish it.
>
> Publishing it as a blog (perhaps at "recap.basho.com") makes a lot of
> sense as it would enable people to consume it without having to sift
> through the rest of the mailing list traffic (and I know there are
> more than a few of you who are on this ML only for the Recaps). More
> importantly, I think it would bring more new readers to the Recap (and
> more users to Riak).
>
> So, in the interest of convenience and expanding the size of the Riak
> community, I think making it a blog might make sense. It would still
> be written, published, and tweeted thrice weekly, just delivered to
> you in your Reader, for example, instead of on the ML.
>
> As you all are the primary consumers of the Recap, I thought I would
> gather some opinions before I did anything drastic. Anyone have
> thoughts on this?
>
> +/-1s, rants, and all other expressions of opinion are encouraged.
>
> Thanks,
>
> Mark
>
> Community Manager
> Basho Technologies
> wiki.basho.com
> twitter.com/pharkmillups
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Has there been any talk of dropping the PB interface?

2011-06-07 Thread Andrew Berman
I'm curious if there has been any talk to drop the protocol buffers
interface in favor of one of the more user-friendly serialization libraries
which support more languages, like Bert (http://bert-rpc.org/) or
MessagePack (http://msgpack.org/).  I would think Bert is a perfect fit for
Riak since it uses native Erlang binary which would make exposing the Erlang
client pretty seamless.  I'm not sure of the speed difference, but the fact
that Google only provides PB support in three languages seems to me to be a
bit of a hindrance.

Thoughts?

--Andrew
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Pros/Cons to not storing JSON

2011-06-09 Thread Andrew Berman
I am using Riak using the Erlang Client API (PB) and I was storing my
documents as JSON and then converting them to records when I pull them out
of Riak, but I got to thinking that maybe this isn't the greatest approach.
I'm thinking that maybe it's better to store documents just as the record
itself (Erlang binary) and then just converting the binary back to the
record when I pull them from Riak.  I was wondering what the pros/cons are
to this approach.  Here's my list so far:

Pros:

Native Erlang is stored, so less time to convert to the record
Better support for nested records
Smaller storage requirements and hence faster on the wire (?)

Cons:

Not readable through Rekon (or other utils) without modification
Can't use standard M/R functions which analyze the document (have to write
all custom functions using Erlang)
Not portable across languages

Thanks,

Andrew
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Pros/Cons to not storing JSON

2011-06-09 Thread Andrew Berman
Ah, yes, you're right.  Basically I'd have to either update all previous
record docs with the new field or I'd have to have multiple record
implementations to support the history of that particular record.  That
could be really, really ugly.

Thanks Sean!

On Thu, Jun 9, 2011 at 2:24 PM, Sean Cribbs  wrote:

> Andrew,
>
> I think you're on the right track here, but I might add that you'll want to
> have upgrade paths available if you're using records -- that is, version
> them -- so that you can evolve their structure over time.  That could be a
> little hairy unless done carefully.
>
> That said, you could use BERT as the serialization format, making
> implementing JavaScript M/R functions a little easier, and interop with
> other languages.
>
> Sean Cribbs 
> Developer Advocate
> Basho Technologies, Inc.
> http://basho.com/
>
> On Jun 9, 2011, at 5:14 PM, Andrew Berman wrote:
>
> > I am using Riak using the Erlang Client API (PB) and I was storing my
> documents as JSON and then converting them to records when I pull them out
> of Riak, but I got to thinking that maybe this isn't the greatest approach.
>  I'm thinking that maybe it's better to store documents just as the record
> itself (Erlang binary) and then just converting the binary back to the
> record when I pull them from Riak.  I was wondering what the pros/cons are
> to this approach.  Here's my list so far:
> >
> > Pros:
> >
> > Native Erlang is stored, so less time to convert to the record
> > Better support for nested records
> > Smaller storage requirements and hence faster on the wire (?)
> >
> > Cons:
> >
> > Not readable through Rekon (or other utils) without modification
> > Can't use standard M/R functions which analyze the document (have to
> write all custom functions using Erlang)
> > Not portable across languages
> >
> > Thanks,
> >
> > Andrew
> > ___
> > riak-users mailing list
> > riak-users@lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Pros/Cons to not storing JSON

2011-06-09 Thread Andrew Berman
Cool, I've looked at BSON before for another project, and it might make
sense in this case as well.

Thanks!

On Thu, Jun 9, 2011 at 2:26 PM, Will Moss  wrote:

> Hey Andrew,
>
> We're using BSON (bsonspec.org), because it stores binary (and other) data
> types better than JSON and is also faster and more wire efficient (sounds
> like about the same reasons you're considering leaving JSON). There are also
> libraries to parse BSON it in just about every language.
>
> I haven't tried using it in a Erlang map-reduce yet (we don't do
> map-reduces for any of our production work), but there is a library out
> there so it shouldn't be too hard.
>
> Will
>
>
> On Thu, Jun 9, 2011 at 2:24 PM, Sean Cribbs  wrote:
>
>> Andrew,
>>
>> I think you're on the right track here, but I might add that you'll want
>> to have upgrade paths available if you're using records -- that is, version
>> them -- so that you can evolve their structure over time.  That could be a
>> little hairy unless done carefully.
>>
>> That said, you could use BERT as the serialization format, making
>> implementing JavaScript M/R functions a little easier, and interop with
>> other languages.
>>
>> Sean Cribbs 
>> Developer Advocate
>> Basho Technologies, Inc.
>> http://basho.com/
>>
>> On Jun 9, 2011, at 5:14 PM, Andrew Berman wrote:
>>
>> > I am using Riak using the Erlang Client API (PB) and I was storing my
>> documents as JSON and then converting them to records when I pull them out
>> of Riak, but I got to thinking that maybe this isn't the greatest approach.
>>  I'm thinking that maybe it's better to store documents just as the record
>> itself (Erlang binary) and then just converting the binary back to the
>> record when I pull them from Riak.  I was wondering what the pros/cons are
>> to this approach.  Here's my list so far:
>> >
>> > Pros:
>> >
>> > Native Erlang is stored, so less time to convert to the record
>> > Better support for nested records
>> > Smaller storage requirements and hence faster on the wire (?)
>> >
>> > Cons:
>> >
>> > Not readable through Rekon (or other utils) without modification
>> > Can't use standard M/R functions which analyze the document (have to
>> write all custom functions using Erlang)
>> > Not portable across languages
>> >
>> > Thanks,
>> >
>> > Andrew
>> > ___
>> > riak-users mailing list
>> > riak-users@lists.basho.com
>> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Pros/Cons to not storing JSON

2011-06-09 Thread Andrew Berman
Well, I'd rather not do it that way and converting it to a string.  But
another thing I can do is convert the record to a proplist and then store
that in the database.  When I pull it out of the database, I would have to
loop through the fields of the record definition, use each field as a key in
the proplist to get the value out of the proplist.  This would avoid the
issue Sean raised with storing a record directly.

On Thu, Jun 9, 2011 at 2:41 PM, Evans, Matthew  wrote:

>  Hi,
>
>
>
> Why not convert your term to a string, and then you can do map reduce can’t
> you?
>
>
>
> Term to a string…
>
>
>
> 1> Term = [{one,1},{two,2},{three,3}].
>
> [{one,1},{two,2},{three,3}]
>
> 2> String = lists:flatten(io_lib:format("~p.", [Term])).
>
> "[{one,1},{two,2},{three,3}]."
>
>
>
> Save “String” in riak…
>
>
>
> Then back to a term…
>
>
>
> 3> String = "[{one,1},{two,2},{three,3}].".
>
> "[{one,1},{two,2},{three,3}]."
>
> 4> {ok,Tok,_} = erl_scan:string(String).
>
> 5> {ok,Term} = erl_parse:parse_term(Tok).
>
> {ok,[{one,1},{two,2},{three,3}]}
>
>
>
> /Matt
>
>
>  --
>
> *From:* riak-users-boun...@lists.basho.com [mailto:
> riak-users-boun...@lists.basho.com] *On Behalf Of *Will Moss
> *Sent:* Thursday, June 09, 2011 5:27 PM
> *To:* Sean Cribbs
> *Cc:* riak-users
> *Subject:* Re: Pros/Cons to not storing JSON
>
>
>
> Hey Andrew,
>
>
>
> We're using BSON (bsonspec.org), because it stores binary (and other) data
> types better than JSON and is also faster and more wire efficient (sounds
> like about the same reasons you're considering leaving JSON). There are also
> libraries to parse BSON it in just about every language.
>
>
>
> I haven't tried using it in a Erlang map-reduce yet (we don't do
> map-reduces for any of our production work), but there is a library out
> there so it shouldn't be too hard.
>
>
>
> Will
>
>
>
> On Thu, Jun 9, 2011 at 2:24 PM, Sean Cribbs  wrote:
>
> Andrew,
>
> I think you're on the right track here, but I might add that you'll want to
> have upgrade paths available if you're using records -- that is, version
> them -- so that you can evolve their structure over time.  That could be a
> little hairy unless done carefully.
>
> That said, you could use BERT as the serialization format, making
> implementing JavaScript M/R functions a little easier, and interop with
> other languages.
>
> Sean Cribbs 
> Developer Advocate
> Basho Technologies, Inc.
> http://basho.com/
>
>
> On Jun 9, 2011, at 5:14 PM, Andrew Berman wrote:
>
> > I am using Riak using the Erlang Client API (PB) and I was storing my
> documents as JSON and then converting them to records when I pull them out
> of Riak, but I got to thinking that maybe this isn't the greatest approach.
>  I'm thinking that maybe it's better to store documents just as the record
> itself (Erlang binary) and then just converting the binary back to the
> record when I pull them from Riak.  I was wondering what the pros/cons are
> to this approach.  Here's my list so far:
> >
> > Pros:
> >
> > Native Erlang is stored, so less time to convert to the record
> > Better support for nested records
> > Smaller storage requirements and hence faster on the wire (?)
> >
> > Cons:
> >
> > Not readable through Rekon (or other utils) without modification
> > Can't use standard M/R functions which analyze the document (have to
> write all custom functions using Erlang)
> > Not portable across languages
> >
> > Thanks,
> >
> > Andrew
>
> > ___
> > riak-users mailing list
> > riak-users@lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Link Walking via Map Reduce

2011-06-22 Thread Andrew Berman
Hello,

I'm having issues link walking using the Map Reduce link function.  I am
using HEAD from Git, so it's possible that's the issue, but here is what is
happening.

I've got two buckets, user and user_email where user_email contains a link
to the user.

When I run this:

{
"inputs": [
[
"user_email",
"myem...@email.com"
]
],
"query": [
{
"link": {
"bucket": "user",
"tag": "user"
}
}
]
}

I only get [["user","LikiWUPJSFuxtrhCYpsPfg","user"]] returned.  The second
I add a map function, even the simplest one (function(v) { [v] } I get a
"map_reduce error":

{
"inputs": [
[
"user_email",
"myem...@email.com"
]
],
"query": [
{
"link": {"bucket":"user", "tag":"user"}
}
   ,{
"map": {
"language": "javascript",
"source": "function(v) { return[v]; }"
}
}
]
}

Is this functionality broken?  I am following what it says on the Wiki for
the MapRed version of link walking.  When I use HTTP link walking, it works
correctly.

Thanks,

Andrew
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Link Walking via Map Reduce

2011-06-22 Thread Andrew Berman
Hey Ryan,

Here is the error from the sasl log.  It looks like some sort of
encoding error.  Any thoughts on how to fix this?  I am storing the
data as BERT encoded binary and I set the content-type as
application/octet-stream.

Thanks for your help!

Andrew

ERROR REPORT 9-Jun-2011::21:37:05 ===
** Generic server <0.5996.21> terminating
** Last message in was {batch_dispatch,
                        {map,
                         {jsanon,<<"function(value) {return [value];}">>},
                         [{struct,
                           [{<<"bucket">>,<<"user">>},
                            {<<"key">>,<<"LikiWUPJSFuxtrhCYpsPfg">>},
                            {<<"vclock">>,

<<"a85hYGBgzGDKBVIsLKaZdzOYEhnzWBmes6Yd58sCAA==">>},
                            {<<"values">>,
                             [{struct,
                               [{<<"metadata">>,
                                 {struct,
                                  [{<<"X-Riak-VTag">>,
                                    <<"1KnL9Dlma9Yg4eMhRuhwtx">>},
                                   {<<"X-Riak-Last-Modified">>,
                                    <<"Fri, 10 Jun 2011 03:05:11 GMT">>}]}},
                                {<<"data">>,

<<131,108,0,0,0,18,104,2,100,0,6,114,...>>}]}]}]},
  <<"user">>,none]}}
** When Server state == {state,<0.143.0>,riak_kv_js_map,#Port<0.92614>,true}
** Reason for termination ==
** {function_clause,[{js_driver,eval_js,
[#Port<0.92614>,{error,bad_encoding},5000]},
 {riak_kv_js_vm,invoke_js,2},
 {riak_kv_js_vm,define_invoke_anon_js,3},
 {riak_kv_js_vm,handle_call,3},
 {gen_server,handle_msg,5},
 {proc_lib,init_p_do_apply,3}]}

=CRASH REPORT 9-Jun-2011::21:37:05 ===
  crasher:
    initial call: riak_kv_js_vm:init/1
    pid: <0.5996.21>
    registered_name: []
    exception exit:
{function_clause,[{js_driver,eval_js,[#Port<0.92614>,{error,bad_encoding},5000]},{riak_kv_js_vm,invoke_js,2},{riak_kv_js_vm,define_invoke_anon_js,3},{riak_kv_js_vm,handle_call,3},{gen_server,handle_msg,5},{proc_lib,init_p_do_apply,3}]}
      in function  gen_server:terminate/6
      in call from proc_lib:init_p_do_apply/3
    ancestors: [riak_kv_js_sup,riak_kv_sup,<0.128.0>]
    messages: []
    links: [<0.142.0>,<0.6009.21>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 4181
    stack_size: 24
    reductions: 2586
  neighbours:
    neighbour: 
[{pid,<0.6009.21>},{registered_name,[]},{initial_call,{riak_kv_mapper,init,[Argument__1]}},{current_function,{gen,do_call,4}},{ancestors,[riak_kv_mapper_sup,riak_kv_sup,<0.128.0>]},{messages,[]},{links,[<0.5996.21>,<12337.6227.21>,<0.162.0>]},{dictionary,[]},{trap_exit,false},{status,waiting},{heap_size,987},{stack_size,53},{reductions,1043}]
=SUPERVISOR REPORT 9-Jun-2011::21:37:05 ===
     Supervisor: {local,riak_kv_js_sup}
     Context:    child_terminated
     Reason:
{function_clause,[{js_driver,eval_js,[#Port<0.92614>,{error,bad_encoding},5000]},{riak_kv_js_vm,invoke_js,2},{riak_kv_js_vm,define_invoke_anon_js,3},{riak_kv_js_vm,handle_call,3},{gen_server,handle_msg,5},{proc_lib,init_p_do_apply,3}]}
     Offender:
[{pid,<0.5996.21>},{name,undefined},{mfargs,{riak_kv_js_vm,start_link,undefined}},{restart_type,temporary},{shutdown,2000},{child_type,worker}]

On Wed, Jun 22, 2011 at 6:10 PM, Ryan Zezeski  wrote:
>
> Andrew,
> Maybe you could elaborate on the error?  I tested this against master (commit 
> below) just now with success.
> 2b1a474f836d962fa035f48c05452e22fc6c2193 Change dependency to allow for 
> R14B03 as well as R14B02
> -Ryan
> On Wed, Jun 22, 2011 at 7:03 PM, Andrew Berman  wrote:
>>
>> Hello,
>> I'm having issues link walking using the Map Reduce link function.  I am 
>> using HEAD from Git, so it's possible that's the issue, but here is what is 
>> happening.
>> I've got two buckets, user and user_email where user_email contains a link 
>> to the user.
>> When I run this:
>> {
>>     "inputs": [
>>         [
>>             "user_email",
>>             "myem...@email.com"
>>         ]
>>     ],
>>     "query": [
>>         {
>>             "link": {
>>                 "bucket": "user",
>>                 "tag&q

Re: Link Walking via Map Reduce

2011-06-23 Thread Andrew Berman
Mathias,

I thought Riak was content agnostic when it came to the data being
stored?  The map phase is not running Riak.mapValuesJson, so why is
the data itself going through the JSON parser?  The JSON value
returned by v with all the info is valid and I see the struct atom in
there so mochijson2 can parse it properly, but I'm not clear why
mochijson2 would be coughing at the data part.

--Andrew

On Thu, Jun 23, 2011 at 5:32 AM, Mathias Meyer  wrote:
> Andrew,
>
> you're indeed hitting a JSON encoding problem here. BERT is binary data, and 
> won't make the JSON parser happy when trying to deserialize it, before 
> handing it into the map phase. You have two options here, and none of them 
> will involve JavaScript as the MapReduce language.
>
> 1.) Use the Protobuff API, use Erlang functions to return the value or object 
> (e.g. riak_mapreduce:map_object_value or riak_kv_mapreduce:map_identity), and 
> then run MapReduce queries with the content type 
> 'application/x-erlang-binary'. However, you're constrained by client 
> libraries here, e.g. Ruby and Python don't support this content type for 
> MapReduce on the Protobuffs interface yet, so you'd either implement 
> something custom, or resort to a client that does, riak-erlang-client comes 
> to mind, though it was proven to be possible using the Java client too, 
> thanks to Russell. See [1] and [2]
>
> 2.) Convert the result from BERT into a JSON-parseable structure inside an 
> Erlang map function, before it's returned to the client.
>
> The second approach certainly is less restrictive in terms of API usage, but 
> certainly involves some overhead inside of the MapReduce request itself, but 
> is less prone to encoding/decoding issues with JSON.
>
> Mathias Meyer
> Developer Advocate, Basho Technologies
>
> [1] 
> http://lists.basho.com/pipermail/riak-users_lists.basho.com/2011-June/004447.html
> [2] 
> http://lists.basho.com/pipermail/riak-users_lists.basho.com/2011-June/004485.html
>
> On Donnerstag, 23. Juni 2011 at 07:59, Andrew Berman wrote:
>
>> Hey Ryan,
>>
>> Here is the error from the sasl log. It looks like some sort of
>> encoding error. Any thoughts on how to fix this? I am storing the
>> data as BERT encoded binary and I set the content-type as
>> application/octet-stream.
>>
>> Thanks for your help!
>>
>> Andrew
>>
>> ERROR REPORT 9-Jun-2011::21:37:05 ===
>> ** Generic server <0.5996.21> terminating
>> ** Last message in was {batch_dispatch,
>>  {map,
>> {jsanon,<<"function(value) {return [value];}">>},
>> [{struct,
>> [{<<"bucket">>,<<"user">>},
>>  {<<"key">>,<<"LikiWUPJSFuxtrhCYpsPfg">>},
>>  {<<"vclock">>,
>>
>> <<"a85hYGBgzGDKBVIsLKaZdzOYEhnzWBmes6Yd58sCAA==">>},
>>  {<<"values">>,
>> [{struct,
>> [{<<"metadata">>,
>> {struct,
>>  [{<<"X-Riak-VTag">>,
>> <<"1KnL9Dlma9Yg4eMhRuhwtx">>},
>> {<<"X-Riak-Last-Modified">>,
>> <<"Fri, 10 Jun 2011 03:05:11 GMT">>}]}},
>>  {<<"data">>,
>>
>> <<131,108,0,0,0,18,104,2,100,0,6,114,...>>}]}]}]},
>> <<"user">>,none]}}
>> ** When Server state == {state,<0.143.0>,riak_kv_js_map,#Port<0.92614>,true}
>> ** Reason for termination ==
>> ** {function_clause,[{js_driver,eval_js,
>>  [#Port<0.92614>,{error,bad_encoding},5000]},
>>  {riak_kv_js_vm,invoke_js,2},
>>  {riak_kv_js_vm,define_invoke_anon_js,3},
>>  {riak_kv_js_vm,handle_call,3},
>>  {gen_server,handle_msg,5},
>>  {proc_lib,init_p_do_apply,3}]}
>>
>> =CRASH REPORT 9-Jun-2011::21:37:05 ===
>>  crasher:
>>  initial call: riak_kv_js_vm:init/1
>>  pid: <0.5996.21>
>>  registered_name: []
>>  exception exit:
>> {function_clause,[{js_driver,eval_js,[#Port<0.92614>,{error,bad_encoding},5000]},{riak_kv_js_vm,invoke_js,2},{riak_kv_js_vm,define_invoke_anon_js,3},{riak_kv_js_vm,handle_call,3},{gen_server,handle_msg,5},{proc_lib,init_p_do_apply,3}]}
>>  in function gen_server:terminate/6
>>  in call from proc_lib:init_p_do_apply/3
>>  ancestors: [riak_kv_js_sup,riak_kv_sup,<0.128.0>]
>>  messages: []
>>  links: [<0.142.0>,<0.6009.21>]
>>  dictionary: []
>>  trap_exit: false
>>  status: running
>>  heap_size: 4181
>>  stack_size: 24
>>  reducti

Re: Link Walking via Map Reduce

2011-06-23 Thread Andrew Berman
But isn't the value itself JSON?  Meaning this part:

{struct,
   [{<<"bucket">>,<<"user">>},
{<<"key">>,<<"LikiWUPJSFuxtrhCYpsPfg">>},
{<<"vclock">>,

<<"a85hYGBgzGDKBVIsLKaZdzOYEhnzWBmes6Yd58sCAA==">>},
{<<"values">>,
 [{struct,
   [{<<"metadata">>,
 {struct,
  [{<<"X-Riak-VTag">>,
<<"1KnL9Dlma9Yg4eMhRuhwtx">>},
   {<<"X-Riak-Last-Modified">>,
<<"Fri, 10 Jun 2011 03:05:11 GMT">>}]}},
{<<"data">>,

<<131,108,0,0,0,18,104,2,100,0,6,114,...>>}]}]}

So the only thing that is not JSON is the data itself, but when I get
the value, shouldn't I be getting the all the info above which is JSON
encoded?

Thank you all for your help,

Andrew

On Thu, Jun 23, 2011 at 8:17 AM, Sean Cribbs  wrote:
> The object has to be JSON-encoded to be marshalled into the Javascript VM,
> and also on the way out if the Accept header indicates application/json.  So
> you have two places where it needs to be encodable into JSON.
> On Thu, Jun 23, 2011 at 11:14 AM, Andrew Berman  wrote:
>>
>> Mathias,
>>
>> I thought Riak was content agnostic when it came to the data being
>> stored?  The map phase is not running Riak.mapValuesJson, so why is
>> the data itself going through the JSON parser?  The JSON value
>> returned by v with all the info is valid and I see the struct atom in
>> there so mochijson2 can parse it properly, but I'm not clear why
>> mochijson2 would be coughing at the data part.
>>
>> --Andrew
>>
>> On Thu, Jun 23, 2011 at 5:32 AM, Mathias Meyer  wrote:
>> > Andrew,
>> >
>> > you're indeed hitting a JSON encoding problem here. BERT is binary data,
>> > and won't make the JSON parser happy when trying to deserialize it, before
>> > handing it into the map phase. You have two options here, and none of them
>> > will involve JavaScript as the MapReduce language.
>> >
>> > 1.) Use the Protobuff API, use Erlang functions to return the value or
>> > object (e.g. riak_mapreduce:map_object_value or
>> > riak_kv_mapreduce:map_identity), and then run MapReduce queries with the
>> > content type 'application/x-erlang-binary'. However, you're constrained by
>> > client libraries here, e.g. Ruby and Python don't support this content type
>> > for MapReduce on the Protobuffs interface yet, so you'd either implement
>> > something custom, or resort to a client that does, riak-erlang-client comes
>> > to mind, though it was proven to be possible using the Java client too,
>> > thanks to Russell. See [1] and [2]
>> >
>> > 2.) Convert the result from BERT into a JSON-parseable structure inside
>> > an Erlang map function, before it's returned to the client.
>> >
>> > The second approach certainly is less restrictive in terms of API usage,
>> > but certainly involves some overhead inside of the MapReduce request 
>> > itself,
>> > but is less prone to encoding/decoding issues with JSON.
>> >
>> > Mathias Meyer
>> > Developer Advocate, Basho Technologies
>> >
>> > [1]
>> > http://lists.basho.com/pipermail/riak-users_lists.basho.com/2011-June/004447.html
>> > [2]
>> > http://lists.basho.com/pipermail/riak-users_lists.basho.com/2011-June/004485.html
>> >
>> > On Donnerstag, 23. Juni 2011 at 07:59, Andrew Berman wrote:
>> >
>> >> Hey Ryan,
>> >>
>> >> Here is the error from the sasl log. It looks like some sort of
>> >> encoding error. Any thoughts on how to fix this? I am storing the
>> >> data as BERT encoded binary and I set the content-type as
>> >> application/octet-stream.
>> >>
>> >> Thanks for your help!
>> >>
>> >> Andrew
>> >>
>> >> ERROR REPORT 9-Jun-2011::21:37:05 ===
>> >> ** Generic server <0.5996.21> terminating
>> >> ** Last message in was {batch_dispatch,
>> >>  {map,
>> >> {jsanon,<<"function(value) {return [

Re: Link Walking via Map Reduce

2011-06-23 Thread Andrew Berman
Ah, ok, that makes sense.  One more question, when I use the HTTP link
walking, I do get the data back as expected, so is there a way to
replicate this in a Map-Reduce job or using the Erlang PBC (which I
forgot to mention is what I'm using and the reason I'm not using the
HTTP link walking method)?

--Andrew

On Thu, Jun 23, 2011 at 8:39 AM, Mathias Meyer  wrote:
> Andrew,
>
> the data looks like JSON, but it's not valid JSON. Have a look at the list 
> that's in the data section (which is your BERT encoded data), the first 
> character in that list is 131, which is not a valid UTF-8 character, and JSON 
> only allows valid UTF-8 characters. With a binary-encoded format, there's 
> always a chance for a control character like that to blow up the JSON 
> generated before and after the MapReduce code is executed. With JSON, content 
> agnosticism only goes as far as the set of legal characters allows.
>
> On a side note, if the data were a valid representation of a string, you 
> would see it as a string in the log file as well, not just as a list of 
> numbers.
>
> Mathias Meyer
> Developer Advocate, Basho Technologies
>
>
> On Donnerstag, 23. Juni 2011 at 17:31, Andrew Berman wrote:
>
>> But isn't the value itself JSON? Meaning this part:
>>
>> {struct,
>>  [{<<"bucket">>,<<"user">>},
>>  {<<"key">>,<<"LikiWUPJSFuxtrhCYpsPfg">>},
>>  {<<"vclock">>,
>>
>> <<"a85hYGBgzGDKBVIsLKaZdzOYEhnzWBmes6Yd58sCAA==">>},
>>  {<<"values">>,
>>  [{struct,
>>  [{<<"metadata">>,
>>  {struct,
>>  [{<<"X-Riak-VTag">>,
>> <<"1KnL9Dlma9Yg4eMhRuhwtx">>},
>>  {<<"X-Riak-Last-Modified">>,
>> <<"Fri, 10 Jun 2011 03:05:11 GMT">>}]}},
>>  {<<"data">>,
>>
>> <<131,108,0,0,0,18,104,2,100,0,6,114,...>>}]}]}
>>
>> So the only thing that is not JSON is the data itself, but when I get
>> the value, shouldn't I be getting the all the info above which is JSON
>> encoded?
>>
>> Thank you all for your help,
>>
>> Andrew
>>
>> On Thu, Jun 23, 2011 at 8:17 AM, Sean Cribbs > (mailto:s...@basho.com)> wrote:
>> > The object has to be JSON-encoded to be marshalled into the Javascript VM,
>> > and also on the way out if the Accept header indicates application/json. So
>> > you have two places where it needs to be encodable into JSON.
>> > On Thu, Jun 23, 2011 at 11:14 AM, Andrew Berman > > (mailto:rexx...@gmail.com)> wrote:
>> > >
>> > > Mathias,
>> > >
>> > > I thought Riak was content agnostic when it came to the data being
>> > > stored? The map phase is not running Riak.mapValuesJson, so why is
>> > > the data itself going through the JSON parser? The JSON value
>> > > returned by v with all the info is valid and I see the struct atom in
>> > > there so mochijson2 can parse it properly, but I'm not clear why
>> > > mochijson2 would be coughing at the data part.
>> > >
>> > > --Andrew
>> > >
>> > > On Thu, Jun 23, 2011 at 5:32 AM, Mathias Meyer > > > (mailto:math...@basho.com)> wrote:
>> > > > Andrew,
>> > > >
>> > > > you're indeed hitting a JSON encoding problem here. BERT is binary 
>> > > > data,
>> > > > and won't make the JSON parser happy when trying to deserialize it, 
>> > > > before
>> > > > handing it into the map phase. You have two options here, and none of 
>> > > > them
>> > > > will involve JavaScript as the MapReduce language.
>> > > >
>> > > > 1.) Use the Protobuff API, use Erlang functions to return the value or
>> > > > object (e.g. riak_mapreduce:map_object_value or
>> > > > riak_kv_mapreduce:map_identity), and then run MapReduce queries with 
>> > > > the
>> > > > content type 'application/x-erlang-binary'. However, you're 
>> > > > constrained by
>> > > > client libraries here, e.g. Ruby and Python don't support this content 
>> > > > type
>> > > > for MapReduce on the Protobuffs interface yet, so you'd either 
>> > > > implement
>> > > > something custom, or resort to a client that does, riak-erl

Re: Link Walking via Map Reduce

2011-06-23 Thread Andrew Berman
Yes, I am able to do that, but I feel like this completely defeats the
purpose of a link by having to do two different calls.  I might as
well just store the user id in the data for user_email instead and not
use a link at all with your method.  What advantage does a link offer
at that point?

On Thu, Jun 23, 2011 at 8:55 AM, Jeremiah Peschka
 wrote:
> HTTP link walking will get you back the data in the way that you'd expect.
> It's a two-step process using PBC. MR link phases will give you a list of
> [bucket, key, tag] that you can then use to pull back the records from
> Riak.
> ---
> Jeremiah Peschka
> Founder, Brent Ozar PLF, LLC
>
> On Thursday, June 23, 2011 at 8:52 AM, Andrew Berman wrote:
>
> Ah, ok, that makes sense. One more question, when I use the HTTP link
> walking, I do get the data back as expected, so is there a way to
> replicate this in a Map-Reduce job or using the Erlang PBC (which I
> forgot to mention is what I'm using and the reason I'm not using the
> HTTP link walking method)?
>
> --Andrew
>
> On Thu, Jun 23, 2011 at 8:39 AM, Mathias Meyer  wrote:
>
> Andrew,
>
> the data looks like JSON, but it's not valid JSON. Have a look at the list
> that's in the data section (which is your BERT encoded data), the first
> character in that list is 131, which is not a valid UTF-8 character, and
> JSON only allows valid UTF-8 characters. With a binary-encoded format,
> there's always a chance for a control character like that to blow up the
> JSON generated before and after the MapReduce code is executed. With JSON,
> content agnosticism only goes as far as the set of legal characters allows.
>
> On a side note, if the data were a valid representation of a string, you
> would see it as a string in the log file as well, not just as a list of
> numbers.
>
> Mathias Meyer
> Developer Advocate, Basho Technologies
>
>
> On Donnerstag, 23. Juni 2011 at 17:31, Andrew Berman wrote:
>
> But isn't the value itself JSON? Meaning this part:
>
> {struct,
>  [{<<"bucket">>,<<"user">>},
>  {<<"key">>,<<"LikiWUPJSFuxtrhCYpsPfg">>},
>  {<<"vclock">>,
>
> <<"a85hYGBgzGDKBVIsLKaZdzOYEhnzWBmes6Yd58sCAA==">>},
>  {<<"values">>,
>  [{struct,
>  [{<<"metadata">>,
>  {struct,
>  [{<<"X-Riak-VTag">>,
> <<"1KnL9Dlma9Yg4eMhRuhwtx">>},
>  {<<"X-Riak-Last-Modified">>,
> <<"Fri, 10 Jun 2011 03:05:11 GMT">>}]}},
>  {<<"data">>,
>
> <<131,108,0,0,0,18,104,2,100,0,6,114,...>>}]}]}
>
> So the only thing that is not JSON is the data itself, but when I get
> the value, shouldn't I be getting the all the info above which is JSON
> encoded?
>
> Thank you all for your help,
>
> Andrew
>
> On Thu, Jun 23, 2011 at 8:17 AM, Sean Cribbs  (mailto:s...@basho.com)> wrote:
>
> The object has to be JSON-encoded to be marshalled into the Javascript VM,
> and also on the way out if the Accept header indicates application/json. So
> you have two places where it needs to be encodable into JSON.
> On Thu, Jun 23, 2011 at 11:14 AM, Andrew Berman  (mailto:rexx...@gmail.com)> wrote:
>
> Mathias,
>
> I thought Riak was content agnostic when it came to the data being
> stored? The map phase is not running Riak.mapValuesJson, so why is
> the data itself going through the JSON parser? The JSON value
> returned by v with all the info is valid and I see the struct atom in
> there so mochijson2 can parse it properly, but I'm not clear why
> mochijson2 would be coughing at the data part.
>
> --Andrew
>
> On Thu, Jun 23, 2011 at 5:32 AM, Mathias Meyer  (mailto:math...@basho.com)> wrote:
>
> Andrew,
>
> you're indeed hitting a JSON encoding problem here. BERT is binary data,
> and won't make the JSON parser happy when trying to deserialize it, before
> handing it into the map phase. You have two options here, and none of them
> will involve JavaScript as the MapReduce language.
>
> 1.) Use the Protobuff API, use Erlang functions to return the value or
> object (e.g. riak_mapreduce:map_object_value or
> riak_kv_mapreduce:map_identity), and then run MapReduce queries with the
> content type 'application/x-erlang-binary'. However, you're constrained by
> client libraries here, e.g. Ruby and Python don't support this content type
> for MapReduce on the Protobuffs interface yet, so you'd either implement
> something custom, or resort to a client that does, riak-erlang-client 

Re: Link Walking via Map Reduce

2011-06-24 Thread Andrew Berman
Mathias,

I took the BERT encoding and then encoded that as Base64 which should
pass the test of valid UTF-8 characters.  However, now I'm starting to
think that maybe doing two encodings and storing that for the purpose
of saving space is not worth the trade-off in performance vs just
storing the data in JSON format.  Do you guys have any thoughts on
this?

Thanks,

Andrew

On Thu, Jun 23, 2011 at 8:39 AM, Mathias Meyer  wrote:
> Andrew,
>
> the data looks like JSON, but it's not valid JSON. Have a look at the list 
> that's in the data section (which is your BERT encoded data), the first 
> character in that list is 131, which is not a valid UTF-8 character, and JSON 
> only allows valid UTF-8 characters. With a binary-encoded format, there's 
> always a chance for a control character like that to blow up the JSON 
> generated before and after the MapReduce code is executed. With JSON, content 
> agnosticism only goes as far as the set of legal characters allows.
>
> On a side note, if the data were a valid representation of a string, you 
> would see it as a string in the log file as well, not just as a list of 
> numbers.
>
> Mathias Meyer
> Developer Advocate, Basho Technologies
>
>
> On Donnerstag, 23. Juni 2011 at 17:31, Andrew Berman wrote:
>
>> But isn't the value itself JSON? Meaning this part:
>>
>> {struct,
>>  [{<<"bucket">>,<<"user">>},
>>  {<<"key">>,<<"LikiWUPJSFuxtrhCYpsPfg">>},
>>  {<<"vclock">>,
>>
>> <<"a85hYGBgzGDKBVIsLKaZdzOYEhnzWBmes6Yd58sCAA==">>},
>>  {<<"values">>,
>>  [{struct,
>>  [{<<"metadata">>,
>>  {struct,
>>  [{<<"X-Riak-VTag">>,
>> <<"1KnL9Dlma9Yg4eMhRuhwtx">>},
>>  {<<"X-Riak-Last-Modified">>,
>> <<"Fri, 10 Jun 2011 03:05:11 GMT">>}]}},
>>  {<<"data">>,
>>
>> <<131,108,0,0,0,18,104,2,100,0,6,114,...>>}]}]}
>>
>> So the only thing that is not JSON is the data itself, but when I get
>> the value, shouldn't I be getting the all the info above which is JSON
>> encoded?
>>
>> Thank you all for your help,
>>
>> Andrew
>>
>> On Thu, Jun 23, 2011 at 8:17 AM, Sean Cribbs > (mailto:s...@basho.com)> wrote:
>> > The object has to be JSON-encoded to be marshalled into the Javascript VM,
>> > and also on the way out if the Accept header indicates application/json. So
>> > you have two places where it needs to be encodable into JSON.
>> > On Thu, Jun 23, 2011 at 11:14 AM, Andrew Berman > > (mailto:rexx...@gmail.com)> wrote:
>> > >
>> > > Mathias,
>> > >
>> > > I thought Riak was content agnostic when it came to the data being
>> > > stored? The map phase is not running Riak.mapValuesJson, so why is
>> > > the data itself going through the JSON parser? The JSON value
>> > > returned by v with all the info is valid and I see the struct atom in
>> > > there so mochijson2 can parse it properly, but I'm not clear why
>> > > mochijson2 would be coughing at the data part.
>> > >
>> > > --Andrew
>> > >
>> > > On Thu, Jun 23, 2011 at 5:32 AM, Mathias Meyer > > > (mailto:math...@basho.com)> wrote:
>> > > > Andrew,
>> > > >
>> > > > you're indeed hitting a JSON encoding problem here. BERT is binary 
>> > > > data,
>> > > > and won't make the JSON parser happy when trying to deserialize it, 
>> > > > before
>> > > > handing it into the map phase. You have two options here, and none of 
>> > > > them
>> > > > will involve JavaScript as the MapReduce language.
>> > > >
>> > > > 1.) Use the Protobuff API, use Erlang functions to return the value or
>> > > > object (e.g. riak_mapreduce:map_object_value or
>> > > > riak_kv_mapreduce:map_identity), and then run MapReduce queries with 
>> > > > the
>> > > > content type 'application/x-erlang-binary'. However, you're 
>> > > > constrained by
>> > > > client libraries here, e.g. Ruby and Python don't support this content 
>> > > > type
>> > > > for MapReduce on the Protobuffs interface yet, so you'd either 
>> > > > implement
>> &g

Re: Link Walking via Map Reduce

2011-06-24 Thread Andrew Berman
And related, does Bitcask have any sort of compression built into it?

On Fri, Jun 24, 2011 at 2:58 PM, Andrew Berman  wrote:
> Mathias,
>
> I took the BERT encoding and then encoded that as Base64 which should
> pass the test of valid UTF-8 characters.  However, now I'm starting to
> think that maybe doing two encodings and storing that for the purpose
> of saving space is not worth the trade-off in performance vs just
> storing the data in JSON format.  Do you guys have any thoughts on
> this?
>
> Thanks,
>
> Andrew
>
> On Thu, Jun 23, 2011 at 8:39 AM, Mathias Meyer  wrote:
>> Andrew,
>>
>> the data looks like JSON, but it's not valid JSON. Have a look at the list 
>> that's in the data section (which is your BERT encoded data), the first 
>> character in that list is 131, which is not a valid UTF-8 character, and 
>> JSON only allows valid UTF-8 characters. With a binary-encoded format, 
>> there's always a chance for a control character like that to blow up the 
>> JSON generated before and after the MapReduce code is executed. With JSON, 
>> content agnosticism only goes as far as the set of legal characters allows.
>>
>> On a side note, if the data were a valid representation of a string, you 
>> would see it as a string in the log file as well, not just as a list of 
>> numbers.
>>
>> Mathias Meyer
>> Developer Advocate, Basho Technologies
>>
>>
>> On Donnerstag, 23. Juni 2011 at 17:31, Andrew Berman wrote:
>>
>>> But isn't the value itself JSON? Meaning this part:
>>>
>>> {struct,
>>>  [{<<"bucket">>,<<"user">>},
>>>  {<<"key">>,<<"LikiWUPJSFuxtrhCYpsPfg">>},
>>>  {<<"vclock">>,
>>>
>>> <<"a85hYGBgzGDKBVIsLKaZdzOYEhnzWBmes6Yd58sCAA==">>},
>>>  {<<"values">>,
>>>  [{struct,
>>>  [{<<"metadata">>,
>>>  {struct,
>>>  [{<<"X-Riak-VTag">>,
>>> <<"1KnL9Dlma9Yg4eMhRuhwtx">>},
>>>  {<<"X-Riak-Last-Modified">>,
>>> <<"Fri, 10 Jun 2011 03:05:11 GMT">>}]}},
>>>  {<<"data">>,
>>>
>>> <<131,108,0,0,0,18,104,2,100,0,6,114,...>>}]}]}
>>>
>>> So the only thing that is not JSON is the data itself, but when I get
>>> the value, shouldn't I be getting the all the info above which is JSON
>>> encoded?
>>>
>>> Thank you all for your help,
>>>
>>> Andrew
>>>
>>> On Thu, Jun 23, 2011 at 8:17 AM, Sean Cribbs >> (mailto:s...@basho.com)> wrote:
>>> > The object has to be JSON-encoded to be marshalled into the Javascript VM,
>>> > and also on the way out if the Accept header indicates application/json. 
>>> > So
>>> > you have two places where it needs to be encodable into JSON.
>>> > On Thu, Jun 23, 2011 at 11:14 AM, Andrew Berman >> > (mailto:rexx...@gmail.com)> wrote:
>>> > >
>>> > > Mathias,
>>> > >
>>> > > I thought Riak was content agnostic when it came to the data being
>>> > > stored? The map phase is not running Riak.mapValuesJson, so why is
>>> > > the data itself going through the JSON parser? The JSON value
>>> > > returned by v with all the info is valid and I see the struct atom in
>>> > > there so mochijson2 can parse it properly, but I'm not clear why
>>> > > mochijson2 would be coughing at the data part.
>>> > >
>>> > > --Andrew
>>> > >
>>> > > On Thu, Jun 23, 2011 at 5:32 AM, Mathias Meyer >> > > (mailto:math...@basho.com)> wrote:
>>> > > > Andrew,
>>> > > >
>>> > > > you're indeed hitting a JSON encoding problem here. BERT is binary 
>>> > > > data,
>>> > > > and won't make the JSON parser happy when trying to deserialize it, 
>>> > > > before
>>> > > > handing it into the map phase. You have two options here, and none of 
>>> > > > them
>>> > > > will involve JavaScript as the MapReduce language.
>>> > > >
>>> > > > 1.) Use the Protobuff API, use Erlang functions to return the value or
>>> > > > object (e.g. riak_mapreduce:map_object_value or
>>>

Re: Link Walking via Map Reduce

2011-06-24 Thread Andrew Berman
Has there been any talk of using compression, maybe something like
Snappy (http://code.google.com/p/snappy/) since it's fast and
shouldn't affect performance too much?

On Fri, Jun 24, 2011 at 3:29 PM, Aphyr  wrote:
> Nope.
>
> On 06/24/2011 03:24 PM, Andrew Berman wrote:
>>
>> And related, does Bitcask have any sort of compression built into it?
>

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Controlling order of results from link phase

2011-06-26 Thread Andrew Berman
I've noticed that when I run the link function, it automatically
orders the links based on Id.  Is there a way to tell it not to sort
the links?  In other words, I want the links in the order in which
they were put in the list (most recent at the head of the list) and I
see from Rekon that that is how they are being stored.  Basically, I'm
running a reduce_slice on the result of the link phase so that Riak
doesn't load up all the objects with which the links are pointing.  If
the answer is no, I cannot control the order of the links, is the only
option to prepend something like the time (e.g. millis since
1-1-1970)?

Thanks,

Andrew

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Controlling order of results from link phase

2011-06-26 Thread Andrew Berman
I think I found the answer and it is no, I cannot control the sort.  I
found the code here in riak_kv_wm_link_walker.erl:

links(Object) ->
MDs = riak_object:get_metadatas(Object),
lists:umerge(
  [ case dict:find(?MD_LINKS, MD) of
{ok, L} ->
[ [B,K,T] || {{B,K},T} <- lists:sort(L) ];
error -> []
end
|| MD <- MDs ]).

Why run lists:sort?  Shouldn't the sort be up to the user after he
gets the actual object?  I don't understand why the sort processing is
necessary at the link phase.  Thoughts?

--Andrew

On Sun, Jun 26, 2011 at 3:15 PM, Andrew Berman  wrote:
> I've noticed that when I run the link function, it automatically
> orders the links based on Id.  Is there a way to tell it not to sort
> the links?  In other words, I want the links in the order in which
> they were put in the list (most recent at the head of the list) and I
> see from Rekon that that is how they are being stored.  Basically, I'm
> running a reduce_slice on the result of the link phase so that Riak
> doesn't load up all the objects with which the links are pointing.  If
> the answer is no, I cannot control the order of the links, is the only
> option to prepend something like the time (e.g. millis since
> 1-1-1970)?
>
> Thanks,
>
> Andrew
>

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Controlling order of results from link phase

2011-06-27 Thread Andrew Berman
Nevermind on the lists:sort issue below as I realized that list:umerge
requires that the lists be sorted.

Thanks anyway,

Andrew


On Sun, Jun 26, 2011 at 3:36 PM, Andrew Berman  wrote:
> I think I found the answer and it is no, I cannot control the sort.  I
> found the code here in riak_kv_wm_link_walker.erl:
>
> links(Object) ->
>    MDs = riak_object:get_metadatas(Object),
>    lists:umerge(
>      [ case dict:find(?MD_LINKS, MD) of
>            {ok, L} ->
>                [ [B,K,T] || {{B,K},T} <- lists:sort(L) ];
>            error -> []
>        end
>        || MD <- MDs ]).
>
> Why run lists:sort?  Shouldn't the sort be up to the user after he
> gets the actual object?  I don't understand why the sort processing is
> necessary at the link phase.  Thoughts?
>
> --Andrew
>
> On Sun, Jun 26, 2011 at 3:15 PM, Andrew Berman  wrote:
>> I've noticed that when I run the link function, it automatically
>> orders the links based on Id.  Is there a way to tell it not to sort
>> the links?  In other words, I want the links in the order in which
>> they were put in the list (most recent at the head of the list) and I
>> see from Rekon that that is how they are being stored.  Basically, I'm
>> running a reduce_slice on the result of the link phase so that Riak
>> doesn't load up all the objects with which the links are pointing.  If
>> the answer is no, I cannot control the order of the links, is the only
>> option to prepend something like the time (e.g. millis since
>> 1-1-1970)?
>>
>> Thanks,
>>
>> Andrew
>>
>

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: How to setup Post-Commit Hooks

2011-06-29 Thread Andrew Berman
They should exist on each node (server-side).  You set up the code
path using http://wiki.basho.com/Configuration-Files.html#add_paths or
-pz in vm.args for each node.  Once you do that you (assuming Erlang),
you would just compile the file and put the beams in the directory you
stated in the path and then start each node and you should be good to
go.  If using JavaScript, you want to set
http://wiki.basho.com/Configuration-Files.html#js_source_dir which
should be pretty self-explanatory.

--Andrew

On Tue, Jun 28, 2011 at 10:38 AM, Charles Daniel  wrote:
> I can't figure out how post-hooks are to be setup in Riak. I was wondering if 
> I could get an example of where to set it up (is it via the client or is it 
> on Riak's server side?). I've read this in the wiki already
> http://wiki.basho.com/Pre--and-Post-Commit-Hooks.html#Post-Commit-Hooks
> but it doesn't exactly go into much detail of how/where to set it up.
>
> Thanks in advance
> -Chuck
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: New Java Client API

2011-06-29 Thread Andrew Berman
So much better than the old API!!  Thanks for the hard work on this.

On Tue, Jun 28, 2011 at 10:53 AM, Russell Brown  wrote:
> Hi,
> A while back (in March) I asked for help/feedback in defining a new API for
> the riak-java-client[1].  Last week I merged a branch to master of the basho
> riak-java-client repo[2] with that new API.
> Defining an API is hard. I like Joshua Bloch's advice:
>     "You can't please everyone so aim to displease everyone equally."
> Well, I didn't *aim* to displease anyone, but if you don't like fluent APIs
> and builders, you're not going to like this ;)
> I had two aims with this re-work
> 1. A common API for the different transports (HTTP/PB)
> 2. A set of strategies for dealing with fault tolerant, eventually
> consistent databases at the client layer
> The bulk of the inspiration for the latter aim came from this talk[3] by
> Coda Hale and Ryan Kennedy of Yammer (and some follow up emails) as well as
> from emails and advice on this list (from Kresten Krab and Jon Brisbin
> (amongst others.)) The team at Basho have been brilliant and very patient
> answering all my questions about the workings of Riak and the best way for
> the client to behave in a given situation. And Mathias Meyer is the best
> remote/IM rubber duck[4] in the world.
> That said, the implementation (and mistakes) are mainly mine. Please have a
> look at the new README, pull the code and play with it. There are some rough
> edges, and I have a long TODO list of tidying and extra features, so if you
> find a bug, need a feature or have any feedback at all please get in touch,
> or even fork the project and send a pull request ;)
> Cheers
> Russell
> [1]
> - http://lists.basho.com/pipermail/riak-users_lists.basho.com/2011-March/003599.html
> [2] - https://github.com/basho/riak-java-client
> [3] - http://blog.basho.com/2011/03/28/Riak-and-Scala-at-Yammer/ (if you
> haven't watched this yet, please do.)
> [4] - http://c2.com/cgi/wiki?RubberDucking
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak or Cassandra for this...

2011-06-29 Thread Andrew Berman
It seems to me that you need to put weights on your requirements
because I think it's going to be pretty tough to meet all of them with
just one solution.  For example, you can use something like Redis to
do fast writes, but it doesn't have Map-Reduce queries.  So, you can
use Redis to write the data and then you can have another program
which moves (look into Redis's awesome Pub/Sub features) the data from
Redis to Riak or Hadoop where you can then perform your Map-Reduce
query.  Just my two cents.

--Andrew

On Tue, Jun 28, 2011 at 8:17 AM, Evans, Matthew  wrote:
> Hi,
>
> I've been looking at a number of technologies for a simple application.
>
> We are saving large amounts of data to disc; this data is event-log/sensor 
> data which may look something like:
>
> Version, Account, RequestID, Timestamp, Duration, IPAddr, Method, URL, HTTP 
> Version, Response_Code, Size, Hit_Rate, Range_From, Range_To, Referrer, 
> Agent, Content_Type, Accept_Encoding, Redirect_Code, Progress
>
>
> For Example:
>
> 1 agora 270509382712866522850368375 1289589216.893 1989.938 
> 79.7.41.29 GET http://bi.sciagnij.pl/0/4/TWEE_Upgrade.exe HTTP/1.1 200 
> 953772216 725098308 713834308 -1 -1 - 
> Mozilla/4.0(compatible;MSIE6.0;WindowsNT5.1) application/octet-stream gzip - 
> 0 progress
>
> The data has no specific key to index off (we will be doing some parsing of 
> the data on ingest to get basic information allowing for fast queries, but 
> this is outside of Riak).
>
> Really the issue is that we need to be able to apply "analytic" (map-reduce) 
> type queries on the data. These queries do not need to be real-time, but 
> should not take days to run.
>
> For example: All GET requests for a specific URL within a specific time range.
>
> The amount of data saved could be quite large (forcing us to use InnoDB 
> instead of BitCask). One estimate is ~1 billion records. Architecturally this 
> data could be split over multiple nodes.
>
> The choice of client-side language is still open, with Erlang as the current 
> favorite. As I see it the advantages of Riak are:
>
> 1) HTTP based API as well as Erlang and other client APIs (the system has a 
> mix of programming languages including Python and C/C++).
>
> 2) More flexible/extensible data model (Cassandra requires you to predefine 
> the key spaces, columns etc ahead of time)
>
> 3) Easier to install/setup without the apparent bloat and complexity of 
> Cassandra (which also includes Java setup)
>
> 4) Map-reduce queries
>
> The disadvantages of Riak are:
>
> 1) Write performance. We need to handle ~50,000 writes per second.
>
> I would recommend running our client app from within the same Erlang VM as 
> Riak so hopefully we can gain something here. Alternatively use innostore 
> Erlang API directly for writes.
>
> Questions:
>
> 1) Is Riak a good database for this application?
>
> 2) Can we write to InnoDB directly and still leverage the map-reduce queries 
> on the data?
>
> Regards
>
> Matt
>
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


When is Vclock generated?

2011-07-21 Thread Andrew Berman
I'm looking to store the vclock on my object to be used for
versioning.   Currently, when I get the object from Riak I fill in the
version with the vclock from Riak (which I use to determine if the
object is persistent and for passing back to Riak when putting) and
then when the object is saved it saves the version as the previous
vclock value.  I'm wondering when the vclock is actually generated.
Can I write a pre-commit hook that fills in the version so it has the
most updated value or is there no way for me to do it.  It's not a
huge deal because the version value in the db is immediately updated
upon loading the data from Riak, but I just feel like it would make
things more consistent if I could have the version matching the
updated vclock.

Thanks for any help!

Andrew

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: When is Vclock generated?

2011-07-22 Thread Andrew Berman
Sean,

Thanks for the reply.  I am fetching before write, however, what I
want is the ability to fetch the vclock and use it as a version on the
object itself throughout my application, much how Hibernate uses a
version property.  This way I am able to tell if an object is
persistent or not based on whether it has a version or if it is
undefined.  So basically I created a version property on my record
which, in turn, gets stored in Riak but this version will never match
the actual vclock when looking at the data itself, since the version
is only ever updated to the previous vclock and not the one after the
object has been updated.  Does that make more sense?  The version I
store in Riak is pretty worthless so it's not a big deal if it doesn't
match since I update it to the current one when I load the object, but
it would certainly make things easier on me if I could always have the
version match the current vclock, so then I wouldn't have to update
the version on fetches.

--Andrew

On Fri, Jul 22, 2011 at 6:06 AM, Sean Cribbs  wrote:
> Andrew,
> Are you trying to store the vclock as part of the value? I'm
> misunderstanding something.
> Well-behaved clients should always fetch before writing, so your client
> should have the most reasonably-fresh version of the object when you write.
>  There's really no way to guarantee that some other actor won't write a new
> version between the time that you fetch the object and store it back, or
> even in the time between your client issuing the write and it actually being
> written to disk.  Vector clocks and sibling values exist in part to help
> disambiguate those race conditions.  If you're submitting the write without
> the vclock, your write could very well be ignored, so please fetch before
> storing.
>
> On Thu, Jul 21, 2011 at 6:13 PM, Andrew Berman  wrote:
>>
>> I'm looking to store the vclock on my object to be used for
>> versioning.   Currently, when I get the object from Riak I fill in the
>> version with the vclock from Riak (which I use to determine if the
>> object is persistent and for passing back to Riak when putting) and
>> then when the object is saved it saves the version as the previous
>> vclock value.  I'm wondering when the vclock is actually generated.
>> Can I write a pre-commit hook that fills in the version so it has the
>> most updated value or is there no way for me to do it.  It's not a
>> huge deal because the version value in the db is immediately updated
>> upon loading the data from Riak, but I just feel like it would make
>> things more consistent if I could have the version matching the
>> updated vclock.
>>
>> Thanks for any help!
>>
>> Andrew
>>
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
>
> --
> Sean Cribbs 
> Developer Advocate
> Basho Technologies, Inc.
> http://www.basho.com/
>

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Connection Pool with Erlang PB Client Necessary?

2011-07-25 Thread Andrew Berman
I know that this subject has been brought up before, but I'm still
wondering what the value of a connection pool is with Riak.  In my
app, I'm using Webmachine resources to talk to a gen_server which in
turn talks to Riak.  So, in other words, the Webmachine resources
never talk to Riak directly, they must always talk to the gen_server
to deal with Riak.  Since Erlang processes are so small and fast to
create, is there really any overhead in having the gen_server create a
new connection (with the same client id) each time it needs to access
Riak?

So the pseudo-code would look like this:

my_webmachine_resource.erl


some_service:persist(MyRecord).

some_service.erl
==

persist(MyRecord) ->
riak_repository:load(LoadSomething),
riak_repository:persist(MyRecord),
riak_repository:persist(SomethingElse).

riak_repository.erl (this is the gen_server)


persist(...) -> call (...)
load(...) -> call(...)

call() ->
  Pid = get_connection(ClientId),
  DoAction(Pid, ),
  close_connection(Pid) %% Is this even necessary?

Thoughts?

Another approach I thought of was:

some_service.erl
==

persist(SomeRecord) ->
   riak_repository:execute(fun(Pid) ->
  riak_repository:persist(..., Pid),
  riak_repository:load(, Pid).
  end).

riak_repository.erl
==

execute(Fun) ->
 try
   Pid = get_connection(),
   Fun(Pid)
  after
   close_connection(Pid)
  end

Is one of these approaches better than the other in dealing with Riak
and vclocks?

Thanks,

Andrew

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Connection Pool with Erlang PB Client Necessary?

2011-07-26 Thread Andrew Berman
Thanks for the reply Bryan.  This all makes sense.  I am fairly new to
Erlang and wasn't sure if using a gen_server solved some of the issues
with connections.  From what I've seen a lot of people simply make
calls to Riak directly from a resource and so I thought having a
gen_server in front of Riak would help to manage things better.
Apparently it doesn't.

So, then, two more questions.  I have used connection pools in Java
like C3P0 and they can ramp up connections and then cull connections
when there is a period of inactivity.  The only pooler I've found that
does this is: https://github.com/seth/pooler .  Do you have any other
recommendations on connection poolers?

Second, I'm still a little confused on client ID.  I thought client Id
represented an actual client, not a connection.  So, in my case, the
gen_server is one client which makes multiple connections.  After
seeing what you wrote and reading a bit more on it, it seems like
client Id should just be some random string (base64 encoded) that
should be generated on creating a connection.  Is that right?

Thanks for your help!

Andrew

On Tue, Jul 26, 2011 at 9:39 AM, Bryan O'Sullivan  wrote:
> On Mon, Jul 25, 2011 at 4:03 PM, Andrew Berman  wrote:
>>
>> I know that this subject has been brought up before, but I'm still
>> wondering what the value of a connection pool is with Riak.
>
> It's a big deal:
>
> It amortises TCP and PBC connection setup overhead over a number of
> requests, thereby reducing average query latency.
> It greatly reduces the likelihood that very busy clients and servers will
> run out of limited resources that are effectively invisible, e.g. closed TCP
> connections stuck in TIME_WAIT.
>
> Each of the above is a pretty big deal. Of course, connection pooling isn't
> free.
>
> If you have many clients talking to a server sporadically, you may end up
> with large numbers of open-and-idle connections on a server, which will both
> consume resources and increase latency for all other clients. This is
> usually only a problem with a very large number (many thousands) of clients
> per server, and it usually only arises with poorly written and tuned
> connection pooling libraries. But ...
> ... Most connection pooling libraries are poorly written and tuned, so
> they'll behave pathologically just when you need them not to.
> Since you don't set up a connection per request, the requests where you *do*
> need to set up a connection are going to be more expensive than those where
> you don't, so you'll see jitter in your latency profile. About 99.9% of
> users will never, ever care about this.
>>
>> Since Erlang processes are so small and fast to
>> create, is there really any overhead in having the gen_server create a
>> new connection (with the same client id) each time it needs to access
>> Riak?
>
> Of course. The overhead of Erlang processes has nothing to do with the cost
> of setting up a connection.
> Also, you really don't want to be using the same client ID repeatedly across
> different connections. That's an awesome way to cause bugs with vclock
> resolution that end up being very very hard to diagnose.

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Connection Pool with Erlang PB Client Necessary?

2011-07-26 Thread Andrew Berman
Thanks for all the replies guys!

I just want to make sure I'm totally clear on this.  Bob's solution
would work well with my design.  So basically, this would be the
workflow?

1.  check out connection from the pool
2.  set client id on connection (which would have some static and some
random component)
3.  perform multiple operations (gets, puts, etc.) which would be seen
as a single "transaction"
4.  check in the connection to the pool

This way once the connection is checked out from the pool, if another
user comes along he cannot get that same connection until it has been
checked back in, which would meet Justin's requirements.  However,
each time it's checked out, a new client id is created.

Does this sound reasonable and in line with proper client id usage?

Thanks again!

Andrew


On Tue, Jul 26, 2011 at 11:55 AM, Justin Sheehy  wrote:
> The simplest guidance on client IDs that I can give:
>
> If two mutation (PUT) operations could occur concurrently or without
> awareness of each other, then they should have different client IDs.
>
> As a result of the above: if you are sharing a connection, then you
> should use a different client ID for each separate user of that
> connection.
>
> -Justin
>

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Connection Pool with Erlang PB Client Necessary?

2011-07-26 Thread Andrew Berman
Awesome!  Thanks for all your help guys.

On Tue, Jul 26, 2011 at 12:20 PM, Justin Sheehy  wrote:
> Yes, Andrew -- that is a fine approach to using a connection pool.
>
> Go for it.
>
> -Justin
>
>
>
> On Tue, Jul 26, 2011 at 3:18 PM, Andrew Berman  wrote:
>> Thanks for all the replies guys!
>>
>> I just want to make sure I'm totally clear on this.  Bob's solution
>> would work well with my design.  So basically, this would be the
>> workflow?
>>
>> 1.  check out connection from the pool
>> 2.  set client id on connection (which would have some static and some
>> random component)
>> 3.  perform multiple operations (gets, puts, etc.) which would be seen
>> as a single "transaction"
>> 4.  check in the connection to the pool
>>
>> This way once the connection is checked out from the pool, if another
>> user comes along he cannot get that same connection until it has been
>> checked back in, which would meet Justin's requirements.  However,
>> each time it's checked out, a new client id is created.
>>
>> Does this sound reasonable and in line with proper client id usage?
>>
>> Thanks again!
>>
>> Andrew
>>
>>
>> On Tue, Jul 26, 2011 at 11:55 AM, Justin Sheehy  wrote:
>>> The simplest guidance on client IDs that I can give:
>>>
>>> If two mutation (PUT) operations could occur concurrently or without
>>> awareness of each other, then they should have different client IDs.
>>>
>>> As a result of the above: if you are sharing a connection, then you
>>> should use a different client ID for each separate user of that
>>> connection.
>>>
>>> -Justin
>>>
>>
>

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Connection Pool with Erlang PB Client Necessary?

2011-07-28 Thread Andrew Berman
Cool, I'll check it out, though there appears to be something wrong
with your account as when I try to view the source, I get an error
back from GitHub.

On Thu, Jul 28, 2011 at 1:55 PM, Joel Meyer  wrote:
>
>
> On Tue, Jul 26, 2011 at 11:35 AM, Andrew Berman  wrote:
>>
>> Thanks for the reply Bryan.  This all makes sense.  I am fairly new to
>> Erlang and wasn't sure if using a gen_server solved some of the issues
>> with connections.  From what I've seen a lot of people simply make
>> calls to Riak directly from a resource and so I thought having a
>> gen_server in front of Riak would help to manage things better.
>> Apparently it doesn't.
>>
>> So, then, two more questions.  I have used connection pools in Java
>> like C3P0 and they can ramp up connections and then cull connections
>> when there is a period of inactivity.  The only pooler I've found that
>> does this is: https://github.com/seth/pooler .  Do you have any other
>> recommendations on connection poolers?
>
> I'm late to the party, but you could take a look at gen_server_pool
> (https://github.com/openx/gen_server_pool). It's a pooling library I wrote
> to provide pooling of gen_servers. I've used it mostly for Thrift clients,
> but Anthony (also on the list) uses it to pool riak_pb clients in
> webmachine. The basic idea is that you'd call
> gen_server_pool:start_link(...) wherever you'd normally call
> gen_server:start_link(...) and pass in a few extra args that control min and
> max pool size, as well as idle timeout. You can use the Pid you get back
> from that the same way you'd use the pid of your gen_server, except that all
> work gets dispatched to a member of a pool instead of a single gen_server.
> To be honest, I haven't tested out the open-source version I posted on
> GitHub (sorry, I've been busy), but it's just a slightly modified version of
> the internal library that's been used in production for several months with
> good results.
> Cheers,
> Joel
>
>>
>> Second, I'm still a little confused on client ID.  I thought client Id
>> represented an actual client, not a connection.  So, in my case, the
>> gen_server is one client which makes multiple connections.  After
>> seeing what you wrote and reading a bit more on it, it seems like
>> client Id should just be some random string (base64 encoded) that
>> should be generated on creating a connection.  Is that right?
>>
>> Thanks for your help!
>>
>> Andrew
>>
>> On Tue, Jul 26, 2011 at 9:39 AM, Bryan O'Sullivan 
>> wrote:
>> > On Mon, Jul 25, 2011 at 4:03 PM, Andrew Berman 
>> > wrote:
>> >>
>> >> I know that this subject has been brought up before, but I'm still
>> >> wondering what the value of a connection pool is with Riak.
>> >
>> > It's a big deal:
>> >
>> > It amortises TCP and PBC connection setup overhead over a number of
>> > requests, thereby reducing average query latency.
>> > It greatly reduces the likelihood that very busy clients and servers
>> > will
>> > run out of limited resources that are effectively invisible, e.g. closed
>> > TCP
>> > connections stuck in TIME_WAIT.
>> >
>> > Each of the above is a pretty big deal. Of course, connection pooling
>> > isn't
>> > free.
>> >
>> > If you have many clients talking to a server sporadically, you may end
>> > up
>> > with large numbers of open-and-idle connections on a server, which will
>> > both
>> > consume resources and increase latency for all other clients. This is
>> > usually only a problem with a very large number (many thousands) of
>> > clients
>> > per server, and it usually only arises with poorly written and tuned
>> > connection pooling libraries. But ...
>> > ... Most connection pooling libraries are poorly written and tuned, so
>> > they'll behave pathologically just when you need them not to.
>> > Since you don't set up a connection per request, the requests where you
>> > *do*
>> > need to set up a connection are going to be more expensive than those
>> > where
>> > you don't, so you'll see jitter in your latency profile. About 99.9% of
>> > users will never, ever care about this.
>> >>
>> >> Since Erlang processes are so small and fast to
>> >> create, is there really any overhead in having the gen_server create a
>> >> new connection (with the same client id) each time it 

Re: Connection Pool with Erlang PB Client Necessary?

2011-08-15 Thread Andrew Berman
So I looked at a bunch of pooling applications and none of them really
have the functionality and flexibility I'm used to with Java
connection pools.  So, I created my own OTP pooling application,
Pooly.  It allows multiple pools to be configured, has flexibility on
configuring the pool (idle timeout, max age of processes, initial
count, acquire increment, max pool size, and min pool size) and
reduces the size of the pool based on the configuration parameters.

Feel free to check it out: https://github.com/aberman/pooly

--Andrew

On Tue, Jul 26, 2011 at 12:20 PM, Justin Sheehy  wrote:
> Yes, Andrew -- that is a fine approach to using a connection pool.
>
> Go for it.
>
> -Justin
>
>
>
> On Tue, Jul 26, 2011 at 3:18 PM, Andrew Berman  wrote:
>> Thanks for all the replies guys!
>>
>> I just want to make sure I'm totally clear on this.  Bob's solution
>> would work well with my design.  So basically, this would be the
>> workflow?
>>
>> 1.  check out connection from the pool
>> 2.  set client id on connection (which would have some static and some
>> random component)
>> 3.  perform multiple operations (gets, puts, etc.) which would be seen
>> as a single "transaction"
>> 4.  check in the connection to the pool
>>
>> This way once the connection is checked out from the pool, if another
>> user comes along he cannot get that same connection until it has been
>> checked back in, which would meet Justin's requirements.  However,
>> each time it's checked out, a new client id is created.
>>
>> Does this sound reasonable and in line with proper client id usage?
>>
>> Thanks again!
>>
>> Andrew
>>
>>
>> On Tue, Jul 26, 2011 at 11:55 AM, Justin Sheehy  wrote:
>>> The simplest guidance on client IDs that I can give:
>>>
>>> If two mutation (PUT) operations could occur concurrently or without
>>> awareness of each other, then they should have different client IDs.
>>>
>>> As a result of the above: if you are sharing a connection, then you
>>> should use a different client ID for each separate user of that
>>> connection.
>>>
>>> -Justin
>>>
>>
>

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Using Secondary Index Result

2011-10-03 Thread Andrew Berman
Hello,

I'm currently using the Riak Erlang client and when I do a get_index I
only get the keys back.  So, my question is, is it better to get the
keys, loop through them and run a get on them one by one, or is it
better to write my own MapRed job which queries the index and then
runs a map phase using the function map_object_value.  I remember
reading somewhere that you're better off running gets on multiple keys
vs using a MapRed job, but is this still the case for this use case
and with Riak 1.0?

Thanks,

Andrew

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Lager AMQP backend

2011-11-17 Thread Andrew Berman
Awesome stuff Jon!

On Thu, Nov 17, 2011 at 1:49 PM, Andrew Thompson  wrote:

> On Thu, Nov 17, 2011 at 02:05:58PM -0600, Jon Brisbin wrote:
> > I pushed to my Github a Lager backend for sending log messages to
> RabbitMQ via AMQP:
> >
> > https://github.com/jbrisbin/lager_amqp_backend
> >
> > It uses a re-connecting connection pool for sending messages, so it's
> pretty fast and will automatically recover if RabbitMQ goes down (but it
> does *not*, at the moment, internally queue log messages if it can't
> connect to the broker).
> >
> > The idea is to aggregate logging from riak_core applications, but you
> should be able to use it in Riak/DB as well.
> >
> Many thanks, it looks good. I've added a link to it from lager's README.
>
> Andrew
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Integrating Riak

2011-11-22 Thread Andrew Berman
Hello,

I'm a little confused on how I would go about integrating Riak into an
Erlang application.  Here's my use case.  I'm creating an HTTP proxy using
Misultin which intercepts any requests to my backend REST services and
provides all the session handling.  So, I would like to use Riak to store
the sessions for the front-end.  Since my proxy is written in Erlang, I
figured it makes more sense to have it run on the same node as Riak and use
the local Riak Erlang client to speed things up. So, questions:

1.  Would I just depend on riak_kv for my app?
2.  How do I go about configuring the riak application from within my
application?  I can't find any documentation on this.
3.  How do I get riak on a node to join the other nodes?
4.  Does it make more sense to just install a riak package and use the
erlang pb client?  Seems like it would be less efficient especially since
these will live on the same machine.

Thanks for any help!

Andrew
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Integrating Riak

2011-11-23 Thread Andrew Berman
Ok, cool, thanks Dave, makes sense.  Keep up the great work!

On Tue, Nov 22, 2011 at 9:31 PM, David Smith  wrote:

> On Tue, Nov 22, 2011 at 4:36 PM, Andrew Berman  wrote:
> > 4.  Does it make more sense to just install a riak package and use the
> > erlang pb client?  Seems like it would be less efficient especially since
> > these will live on the same machine.
>
> This is the preferred way to attack this problem. Separation of the
> functionality by O/S processes is appropriate and much easier to
> reason about in error situations. Loopback sockets to the PBs
> interface should be within a 1 ms of total request handling time.
>
> D.
>
> --
> Dave Smith
> Director, Engineering
> Basho Technologies, Inc.
> diz...@basho.com
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Best practices for using the PB client

2011-12-30 Thread Andrew Berman
You should look into using HAProxy in front of your nodes.  Let HAProxy
load balance between all your nodes and then if one goes down, HAProxy just
pulls it out of the load balancing cluster automatically until it is
restored.  Then your pooler can just pool connections from HAProxy instead
so it doesn't have to worry at all about failed nodes.

Also, shameless plug, I have a pooler as well which has a few more options
than pooler.  You can check it out here: https://github.com/aberman/pooly

--Andrew

On Fri, Dec 30, 2011 at 9:58 AM, Marc Campbell  wrote:

> Hey all,
>
> I'm looking for some best practices in handling connections when using the
> protocol buffer client.  Specifically, I have 3 nodes in my cluster, and
> need to figure out how to handle the situation when one of the nodes is
> down.
>
> I'm currently using a pooler app (https://github.com/seth/pooler) and
> this helps me distribute the load to all of the nodes, but when one goes
> down, the app doesn't recover nicely.
>
> I'm about to write some code in my app to handle this, but before I do, I
> thought I'd check for existing solutions and best practices:
>
> - Is there an existing connection pooling mechanism that someone has
> created which handles node failures automatically?
>
> If not, then I'm looking forward to writing it!
>
> Thank in advance,
> Marc
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com