Erlang Client: get_update_metadata vs get_metadata
Can someone explain the difference between the get_update_metadata and get_metadata functions in the Erlang PB Client for Riak? It's very confusing... Thanks, Andrew ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Erlang Client: get_update_metadata vs get_metadata
Thanks guys, I'm more inclined to have an API like get_original_metadata and get_metadata. The get_metadata in this case always returns whatever metadata is set on the object, new or original. In the current API, if calling get_update_metadata will return the original metadata if there are no changes, then I kinda fail to see a use case for the get_metadata call. Anyone? Thanks again! Andrew On Tue, Jun 12, 2012 at 2:37 PM, Michael Radford wrote: > Reid, > > I do understand why update_metadata exists. I guess what I'm > suggesting is a better default behavior, especially for users who > don't explicitly set any metadata values. (Or even if they do, for > when all the metadatas are equivalent.) > > I.e., something like this for riakc_obj:get_update_metadata: > > get_update_metadata(#riakc_obj{updatemetadata=UM}=Object) -> >case UM of >undefined -> >try >get_metadata(Object) >catch >throw:no_metadata -> >dict:new(); >throw:siblings -> >default_resolve_metadatas(get_metadatas(Object)) >end; >UM -> >UM >end. > > default_resolve_metadatas(Ms = [M | _]) -> >UniqueWritten = lists:usort([ [KV || KV = {K, _V} <- dict:to_list(M), > K =/= ?MD_LASTMOD, > K =/= ?MD_INDEX > || M <- Ms ]), >case UniqueWritten of > [_]-> M; > [_, _ | _] -> throw(siblings) >end. > > Mike > > On Tue, Jun 12, 2012 at 1:18 PM, Reid Draper wrote: > > > > On Jun 12, 2012, at 2:56 PM, Michael Radford wrote: > > > >> get_metadata returns the metadata that was read from riak. But if > >> allow_mult is true, and there is more than one sibling, then > >> get_metadata throws the exception 'siblings'. You have to call > >> get_metadatas to get a list with metadata for each sibling in that > >> case. > >> > >> get_update_metadata returns the metadata that is to be written for the > >> object (if you were to call riakc_pb_socket:put at that point). The > >> update metadata is either a single value set explicitly with > >> riakc_obj:update_metadata, or if none was set, and there is only one > >> sibling, then the default is the value of get_metadata. > >> > >> A related question: if I'm not using any user-specified metadata at > >> all, but I do have allow_mult turned on, then how do I choose which > >> metadata to write back to riak after resolving the conflict? Or could > >> I just call update_metadata with an empty dict in that case? > > I'd recommend calling update_metadata with an empty dict. Be sure > > to set the content_type as well. > >> > >> Right now, I have some conflict resolution code that uses the same > >> default strategy as mochimedia's statebox_riak library, which > >> arbitrarily chooses the first metadata. But this seems less than > >> ideal: everything in the metadata is coming from riak, and some of it > >> (e.g., last-modified timestamps) must be ignored when doing the > >> update. So it seems like riak should be able to resolve the "metadata > >> conflict" on its own: just prune all the metadata keys that aren't > >> actually written, and then if the resulting pruned metadatas are > >> identical, then there's no conflict. Or, if there is some reason why > >> the user should prefer one metadata over another, then the client > >> library should give the user some way to decide. > > There are definitely cases where the user wants to choose one metadata > > over another, or perhaps more commonly, "merge" them together, according > > to some conflict resolution semantics. The client provides > `update_metadata` > > for this reason. `select_sibling/2` can be used to choose a particular > {Metadata, Value} > > pair as well. > >> > >> Mike > >> > >> On Tue, Jun 12, 2012 at 11:25 AM, Andrew Berman > wrote: > >>> Can someone explain the difference between the get_update_metadata and > >>> get_metadata functions in the Erlang PB Client for Riak? It's very > >>> confusing... > >>> > >>> Thanks, > >>> > >>> Andrew > >>> > >>> ___ > >>> riak-users mailing list > >>> riak-users@lists.basho.com > >>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > >>> > >> > >> ___ > >> riak-users mailing list > >> riak-users@lists.basho.com > >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: riak-erlang-client search changes
The more fields the better. I like this change. Andrew On Aug 2, 2012 8:17 AM, "Dave Parfitt" wrote: > Hello - > > We're considering some changes to the Riak Search functionality in > riak-erlang-client for the upcoming Riak 1.2 release. The current behavior of > the riakc_pb_socket:search/* functions return a list in the form: [[Index, > Id],[Index2,Id2],...] > > With the new Riak Search protobuffs messages, we have the ability to also > return fields from the search doc in the results (as additional values in the > tuple). Also, it's possible to return the search results "max score", and > "number found". Does anyone have any objections to returning additional > fields? To maintain semi-compatible behavior, it's possible to use the fl > (field limit) search option to just return the id. > > Current behavior: > riakc_pb_socket:search(Pid, <<"phrases_custom">>, <<"phrase:fox">>). > {ok,[[<<"phrases_custom">>,<<"5">>], > [<<"phrases_custom">>,<<"1">>]]} > > Proposed behavior: > riakc_pb_socket:search(Pid, <<"phrases_custom">>, <<"phrase:fox">>). > {ok,[{<<"phrases_custom">>, > [{<<"id">>,<<"1">>}, > {<<"phrase">>,<<"The quick brown fox jumps over the lazy dog">>}], > {<<"phrases_custom">>, > [{<<"id">>,<<"5">>}, > {<<"phrase">>,<<"The quick brown fox jumps over the lazy dog">>}], > 0.0,2} > %% Note the last two fields of the result are Max Score and Number Found. > > Semi-compatible behavior by specifying the fl (with the exception of max > score and number found): > riakc_pb_socket:search(Pid, <<"phrases_custom">>, <<"phrase:fox">>, > [{fl,[<<"id">>]}], 5000, 5000). > {ok,[{<<"phrases_custom">>,[{<<"id">>,<<"1">>}]}, > {<<"phrases_custom">>,[{<<"id">>,<<"5">>}]}, > 0.0,2} > > Cheers - > Dave > > > > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Riak won't start -- RHEL6
What do your config files look like? Do you have proper permissions on the Riak directory? On Aug 25, 2012 10:10 AM, "Vladimir Kupcov" wrote: > Hi, > > I installed Riak from .rpm package on RHEL6 on virtual machine. > Unfortunately, I can't get Riak to start. Here is the console output: > > > > [idcuser@vhost0536 ~]$ riak console > Attempting to restart script through sudo -H -u riak > Exec: /usr/lib64/riak/erts-5.9.1/bin/erlexec -boot > /usr/lib64/riak/releases/1.2.0/riak -embedded -config > /etc/riak/app.config -pa /usr/lib64/riak/basho-patches > -args_file /etc/riak/vm.args -- console > Root: /usr/lib64/riak > {error_logger,{{2012,8,25},{4,0,21}},"Protocol: ~p: register error: > ~p~n",["inet_tcp",{{badmatch,{error,etimedout}},[{inet_tcp_dist,listen,1,[{file,"inet_tcp_dist.erl"},{line,70}]},{net_kernel,start_protos,4,[{file,"net_kernel.erl"},{line,1314}]},{net_kernel,start_protos,3,[{file,"net_kernel.erl"},{line,1307}]},{net_kernel,init_node,2,[{file,"net_kernel.erl"},{line,1197}]},{net_kernel,init,1,[{file,"net_kernel.erl"},{line,357}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,304}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]}]} > > {error_logger,{{2012,8,25},{4,0,21}},crash_report,[[{initial_call,{net_kernel,init,['Argument__1']}},{pid,<0.20.0>},{registered_name,[]},{error_info,{exit,{error,badarg},[{gen_server,init_it,6,[{file,"gen_server.erl"},{line,320}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]}},{ancestors,[net_sup,kernel_sup,<0.10.0>]},{messages,[]},{links,[#Port<0.194>,<0.17.0>]},{dictionary,[{longnames,true}]},{trap_exit,true},{status,running},{heap_size,610},{stack_size,24},{reductions,507}],[]]} > > {error_logger,{{2012,8,25},{4,0,21}},supervisor_report,[{supervisor,{local,net_sup}},{errorContext,start_error},{reason,{'EXIT',nodistribution}},{offender,[{pid,undefined},{name,net_kernel},{mfargs,{net_kernel,start_link,[[' > riak@127.0.0.1 > ',longnames]]}},{restart_type,permanent},{shutdown,2000},{child_type,worker}]}]} > > {error_logger,{{2012,8,25},{4,0,21}},supervisor_report,[{supervisor,{local,kernel_sup}},{errorContext,start_error},{reason,shutdown},{offender,[{pid,undefined},{name,net_sup},{mfargs,{erl_distribution,start_link,[]}},{restart_type,permanent},{shutdown,infinity},{child_type,supervisor}]}]} > > {error_logger,{{2012,8,25},{4,0,21}},std_info,[{application,kernel},{exited,{shutdown,{kernel,start,[normal,[]]}}},{type,permanent}]} > {"Kernel pid > terminated",application_controller,"{application_start_failure,kernel,{shutdown,{kernel,start,[normal,[]]}}}"} > > Crash dump was written to: /var/log/riak/erl_crash.dump > Kernel pid terminated (application_controller) > ({application_start_failure,kernel,{shutdown,{kernel,start,[normal,[]]}}}) > > > > Any suggestions? > > Thank you, > Vlad > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Riak and Zookeeper
Hello, I'm wondering if anyone has explored the idea of using Zookeeper in front of Riak to handle locking. My thought is that a client goes to Zookeeper to get a lock on a key before updating. Any other client that wishes to update the same key must check for the existence of a lock. If it exists, an error is thrown, if not, then it proceeds. Once the client is finished with the key, it releases the lock. --Andrew ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Riak and Zookeeper
Thanks for the info guys! On Fri, Mar 22, 2013 at 7:42 PM, Andrew Stone wrote: > You may also want to have a look at this post by Aphyr. There are a LOT of > caveats when trying to do this sort of thing. > > http://aphyr.com/posts/254-burn-the-library > > -Andrew > > > On Fri, Mar 22, 2013 at 9:02 PM, Sean Cribbs wrote: > >> Datomic does something similar -- except that instead of updating keys >> in-place, it only adds new values to Riak and advances the pointer(s) >> to the current state in ZK. >> http://www.infoq.com/presentations/Deconstructing-Database >> >> On Fri, Mar 22, 2013 at 7:32 PM, Andrew Berman wrote: >> > Hello, >> > >> > I'm wondering if anyone has explored the idea of using Zookeeper in >> front of >> > Riak to handle locking. My thought is that a client goes to Zookeeper >> to >> > get a lock on a key before updating. Any other client that wishes to >> update >> > the same key must check for the existence of a lock. If it exists, an >> error >> > is thrown, if not, then it proceeds. Once the client is finished with >> the >> > key, it releases the lock. >> > >> > --Andrew >> > >> > ___ >> > riak-users mailing list >> > riak-users@lists.basho.com >> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> > >> >> >> >> -- >> Sean Cribbs >> Software Engineer >> Basho Technologies, Inc. >> http://basho.com/ >> >> ___ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> > > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Urgent help with a down node.
Bryan, What version of Erlang? You should check this out: https://github.com/basho/riak_kv/issues/411 BTW - Google is your friend, which is how I found the above issue :) --Andrew On Sun, Jul 7, 2013 at 3:01 PM, Bryan Hughes wrote: > Hi Mark, > > DOH - sorry for the lack of detail. Didnt have enough coffee this morning. > > OS: CentOS release 6.3 (Final) > Riak: Riak 1.2.1 > > Hadnt had a chance to upgrade to 1.3 yet. > > Got the node back up - but not entirely sure why which is a little > concerning. Been verifying the data, and everything looks intact. When I > try to run riak-admin status, I get the following (note I am not entirely > sure this was the case when we first set the node up): > > $ riak-admin status > Status failed, see log for details > > The logs shows: > > 2013-07-07 14:55:03.858 [error] <0.12982.0>@riak_kv_console:status:173 > Status failed error:function_clause > 2013-07-07 14:55:03.858 [error] emulator Error in process <0.12983.0> on > node 'riak@127.0.0.1' with exit value: > {badarg,[{erlang,system_info,[global_heaps_size],[]},{riak_kv_stat,system_stats,0,[{file,"src/riak_kv_stat.erl"},{line,421}]},{riak_kv_stat,produce_stats,0,[{file,"src/riak_kv_stat.erl"},{line,320}]},{timer,tc,3,[{file,"timer... > > > This is on a dev cluster with an out-of-the box configuration using > bitcask. > > Thanks! > > Bryan > > > On 7/7/13 2:51 PM, Mark Phillips wrote: > > Hi Bryan, > > I remember seeing something similar on the list a while ago. I'll dig > through the archives (Riak.markmail.org) if I have a few minutes later > tonight. > > In the mean time, what version of Riak is this? And what OS? > > Mark > > On Sunday, July 7, 2013, Bryan Hughes wrote: > >> Anyone familiar with this error message? >> >> 2013-07-07 12:51:42 =ERROR REPORT >> Hintfile >> './data/bitcask/22835963083295358096932575511191922182123945984/3.bitcask.hint' >> contains pointer 16555635 566 that is greater than total data size 16556032 >> 2013-07-07 12:51:45 =ERROR REPORT >> Hintfile >> './data/bitcask/114179815416476790484662877555959610910619729920/3.bitcask.hint' >> contains pointer 17817310 567 that is greater than total data size >> 17817600 >> 2013-07-07 12:51:46 =ERROR REPORT >> Hintfile >> './data/bitcask/159851741583067506678528028578343455274867621888/3.bitcask.hint' >> contains pointer 7573448 567 that is greater than total data size 7573504 >> 2013-07-07 12:51:46 =ERROR REPORT >> Bad datafile entry 1: >> {ok,<<131,104,2,109,0,0,0,9,65,80,73,67,79,85,78,84,83,109,0,0,0,33,55,56,54,57,52,49,56,49,94,103,111,115,101,114,118,105,99,101,95,99>>} >> 2013-07-07 12:51:56 =ERROR REPORT >> Hintfile >> './data/bitcask/730750818665451459101842416358141509827966271488/3.bitcask.hint' >> contains pointer 13229833 581 that is greater than total data size 13230080 >> 2013-07-07 12:52:05 =ERROR REPORT >> Hintfile >> './data/bitcask/1187470080331358621040493926581979953470445191168/3.bitcask.hint' >> contains pointer 23465420 578 that is greater than total data size 23465984 >> 2013-07-07 12:52:06 =ERROR REPORT >> Hintfile >> './data/bitcask/1210306043414653979137426502093171875652569137152/3.bitcask.hint' >> contains pointer 27733824 578 that is greater than total data size 27734016 >> 2013-07-07 12:52:07 =ERROR REPORT >> Hintfile >> './data/bitcask/1233142006497949337234359077604363797834693083136/3.bitcask.hint' >> contains pointer 15014008 578 that is greater than total data size >> 15014586 >> 2013-07-07 12:54:43 =ERROR REPORT >> Bad datafile entry, discarding(383/566 bytes) >> 2013-07-07 12:54:45 =ERROR REPORT >> Bad datafile entry, discarding(276/567 bytes) >> 2013-07-07 12:54:46 =ERROR REPORT >> Bad datafile entry, discarding(42/567 bytes) >> 2013-07-07 12:54:57 =ERROR REPORT >> Bad datafile entry, discarding(233/581 bytes) >> 2013-07-07 12:55:06 =ERROR REPORT >> Bad datafile entry, discarding(550/578 bytes) >> 2013-07-07 12:55:07 =ERROR REPORT >> Bad datafile entry, discarding(178/578 bytes) >> 2013-07-07 12:56:00 =ERROR REPORT >> Error in process <0.1536.0> on node 'riak@127.0.0.1' with exit value: >> {badarg,[{erlang,system_info,[global_heaps_size],[]},{riak_kv_stat,system_stats,0,[{file,"src/riak_kv_stat.erl"},{line,421}]},{riak_kv_stat,produce_stats,0,[{file,"src/riak_kv_stat.erl"},{line,320}]},{timer,tc,3,[{file,"timer... >> >> -- >> >> Bryan Hughes >> *Go Factory* >> http://www.go-factory.net >> >> *"Internet Class, Enterprise Grade"* >> >> >> > -- > > Bryan Hughes > CTO and Founder / *Go Factory* > (415) 515-7916 > > http://www.go-factory.net > > *"Internet Class, Enterprise Grade"* > > > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Links vs Key Filters for Performance
I was curious if anyone has any thoughts on what is more performant, links or key filters in terms of secondary links. For example: I want to be able to look up a user by id and email: *Link implementation:* Two buckets: user and user_email, where id is the key of user and email is the key of user_email. User_email contains no data but simply has a link pointing back to the proper user. *Key Filter:* One bucket: user, where id_email is the key of the bucket. Lookups would use a key filter tokenizing the id and then looking up the id or email based on the proper token. Obviously both work, but I'm curious what the implications are from a performance standpoint. Thanks, Andrew ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Links vs Key Filters for Performance
Ah, that makes sense. So is it the case that using the link implementation will always be faster? Or are there cases where it makes more sense to use a key filter? Thanks! --Andrew On Thu, May 5, 2011 at 3:44 PM, Aphyr wrote: > The key filter still has to walk the entire keyspace, which will make > fetches an O(n) operation as opposed to O(1). > > --Kyle > > > On 05/05/2011 03:35 PM, Andrew Berman wrote: > >> I was curious if anyone has any thoughts on what is more performant, >> links or key filters in terms of secondary links. For example: >> >> I want to be able to look up a user by id and email: >> >> *Link implementation:* >> >> Two buckets: user and user_email, where id is the key of user and email >> is the key of user_email. User_email contains no data but simply has a >> link pointing back to the proper user. >> >> *Key Filter:* >> >> One bucket: user, where id_email is the key of the bucket. Lookups >> would use a key filter tokenizing the id and then looking up the id or >> email based on the proper token. >> >> Obviously both work, but I'm curious what the implications are from a >> performance standpoint. >> >> Thanks, >> >> Andrew >> >> >> >> ___ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Links vs Key Filters for Performance
Ok, cool. Thanks for the example! On Thu, May 5, 2011 at 3:51 PM, Aphyr wrote: > I suppose if you had a really small number of keys in Riak it might be > faster, but you're almost certainly better off maintaining a second object > and making the lookup constant time. Here's an example: > > https://github.com/aphyr/risky/blob/master/lib/risky/indexes.rb > > --Kyle > > > On 05/05/2011 03:49 PM, Andrew Berman wrote: > >> Ah, that makes sense. So is it the case that using the link >> implementation will always be faster? Or are there cases where it makes >> more sense to use a key filter? >> >> Thanks! >> >> --Andrew >> >> On Thu, May 5, 2011 at 3:44 PM, Aphyr > <mailto:ap...@aphyr.com>> wrote: >> >>The key filter still has to walk the entire keyspace, which will >>make fetches an O(n) operation as opposed to O(1). >> >>--Kyle >> >> >>On 05/05/2011 03:35 PM, Andrew Berman wrote: >> >>I was curious if anyone has any thoughts on what is more >> performant, >>links or key filters in terms of secondary links. For example: >> >>I want to be able to look up a user by id and email: >> >>*Link implementation:* >> >>Two buckets: user and user_email, where id is the key of user >>and email >>is the key of user_email. User_email contains no data but >>simply has a >>link pointing back to the proper user. >> >>*Key Filter:* >> >>One bucket: user, where id_email is the key of the bucket. Lookups >>would use a key filter tokenizing the id and then looking up the >>id or >>email based on the proper token. >> >>Obviously both work, but I'm curious what the implications are >>from a >>performance standpoint. >> >>Thanks, >> >>Andrew >> >> >> >>___ >>riak-users mailing list >>riak-users@lists.basho.com <mailto:riak-users@lists.basho.com> >> >>http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> >> >> > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Links vs Key Filters for Performance
Yes, but this would be a bucket where each key would only ever have one link pointing back to the original user. --Andrew On Thu, May 5, 2011 at 3:52 PM, Jason J. W. Williams < jasonjwwilli...@gmail.com> wrote: > On Thu, May 5, 2011 at 4:49 PM, Andrew Berman wrote: > > Ah, that makes sense. So is it the case that using the link > implementation > > will always be faster? Or are there cases where it makes more sense to > use > > a key filter? > > There's a practical limit to how many links you can walk before > performance becomes unacceptable. > > -J > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Link walking
I don't totally understand what you're doing in your code, but it looks like you have the map phase before the link phase which doesn't make sense since you want the data from the link phase passed on to the map phase, not the other way around. On Fri, May 6, 2011 at 9:37 AM, Joshua Hanson wrote: > In my map/reduce query I would like to keep the input data > and also data from the link-walking phase. > > Here is some sample code: > > #insert message > db.save('messages', 'josh-123', 'secret message', function(err, message) { > #insert people object with link to message we just inserted > db.save('people', 'josh', {'profession': 'developer'}, > { links: [{ bucket: 'messages', key: 'josh-123', 'tag': 'message' }]}, > function(err, data) { > db.add([['people', 'josh']]) > .map({ 'source': 'Riak.mapValuesJson', 'keep': true}) > .link({ 'bucket': 'messages'}) > .run(function(err, data) { > if (err) return console.log(err); > console.log(data); > }) > } > ) > }) > > So, if I remove the 'link' phase I get the correct object from the map > phase > but if I instead remove the 'map' phase, I get the correct object from link > phase. > > However, having both together does not work. Is it possible to get at both > the original data and the data from link-walking in the same query? > _ > Joshua Hanson > e: joshua.b.han...@gmail.com > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Link walking
The results from one phase get passed on to the next phase, so if you want both data sets you need to run two different map-reduce queries. If you want the results from the link phase you need to run a link phase in addition to a map phase (link first and then you can do the map phase with Riak.mapValuesJson, which will give you the object that the link is pointing to). On Fri, May 6, 2011 at 12:14 PM, Joshua Hanson wrote: > The initial map phase is to grab the values from the input phase (.add()). > I want these values as well as the ones exposed from .link() but not sure > how to express it. > > _ > Joshua Hanson > e: joshua.b.han...@gmail.com > > > On Fri, May 6, 2011 at 3:01 PM, Andrew Berman wrote: > >> I don't totally understand what you're doing in your code, but it looks >> like you have the map phase before the link phase which doesn't make sense >> since you want the data from the link phase passed on to the map phase, not >> the other way around. >> >> On Fri, May 6, 2011 at 9:37 AM, Joshua Hanson >> wrote: >> >>> In my map/reduce query I would like to keep the input data >>> and also data from the link-walking phase. >>> >>> Here is some sample code: >>> >>> #insert message >>> db.save('messages', 'josh-123', 'secret message', function(err, message) >>> { >>> #insert people object with link to message we just inserted >>> db.save('people', 'josh', {'profession': 'developer'}, >>> { links: [{ bucket: 'messages', key: 'josh-123', 'tag': 'message' }]}, >>> function(err, data) { >>> db.add([['people', 'josh']]) >>> .map({ 'source': 'Riak.mapValuesJson', 'keep': true}) >>> .link({ 'bucket': 'messages'}) >>> .run(function(err, data) { >>> if (err) return console.log(err); >>> console.log(data); >>> }) >>> } >>> ) >>> }) >>> >>> So, if I remove the 'link' phase I get the correct object from the map >>> phase >>> but if I instead remove the 'map' phase, I get the correct object from >>> link phase. >>> >>> However, having both together does not work. Is it possible to get at >>> both the original data and the data from link-walking in the same query? >>> _ >>> Joshua Hanson >>> e: joshua.b.han...@gmail.com >>> >>> ___ >>> riak-users mailing list >>> riak-users@lists.basho.com >>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >>> >>> >> > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Error when trying to use a javascript custom extractor in Riaksearch
I'll preface this by saying I've never used this feature rs_extractfun should be set to one of the values defined in the Other Encodings section ( http://wiki.basho.com/Riak-Search---Indexing-and-Querying-Riak-KV-Data.html). In your case, {jsanon, "function(a,b){return{\"user\":\"gpascale\", \"name\":\"greg\"};}"} Hope that helps, Andrew On Sat, May 21, 2011 at 7:48 PM, Greg Pascale wrote: > I've been banging my head against the wall trying to get a javascript > custom extractor working. Here is the simplest example I could come up with > to reproduce the error. > > *curl -v -X PUT -H "Content-Type: application/json" > http://localhost:8098/riak/test -d @data* > > where *@data* is a file that looks like > > *{"props":* > * {"rs_extractfun":* > * {"language" : "javascript", * > * "source" : "function(a,b){return{\"user\":\"gpascale\", > \"name\":\"greg\"};}"* > * }* > * }* > *}* > * > * > This completes successfully, and I can verify it by looking at the > properties of the "test" bucket. > > *{"props":{"allow_mult":true,"basic_quorum":true,"big_vclock":50,"chash_keyfun":{"mod":"riak_core_util","fun":"chash_std_keyfun"},"dw":"quorum","last_write_wins":false,"linkfun":{"mod":"riak_kv_wm_link_walker","fun":"mapreduce_linkfun"},"n_val":3,"name":"test","notfound_ok":false,"old_vclock":86400,"postcommit":[],"pr":0,"precommit":[{"mod":"riak_search_kv_hook","fun":"precommit"}],"pw":0,"r":"quorum","rs_extractfun":{"language":"javascript","source":"function(a,b){return{\"user\":\"gpascale\", > \"name\":\"greg\"};}"},"rw":"quorum","small_vclock":10,"w":"quorum","young_vclock":20}} > * > > However, when I try to insert something into the bucket, I get an error > > *curl -X PUT http://localhost:8098/riak/test/test1 -d "Hello, world!"* > > *{error,badarg,* > *[{erlang,iolist_to_binary,* > * [{hook_crashed,* > * {riak_search_kv_hook,precommit,exit,* > * {noproc,* > * {gen_server,call,* > * [riak_search_js_extract,reserve_vm,* > * infinity]]},* > * {wrq,append_to_response_body,2},* > * {riak_kv_wm_raw,accept_doc_body,2},* > * {webmachine_resource,resource_call,3},* > * {webmachine_resource,do,3},* > * {webmachine_decision_core,resource_call,1},* > * {webmachine_decision_core,accept_helper,0},* > * {webmachine_decision_core,decision,1}]}}* > * > * > It doesn't matter if the thing I insert is a string, as above, or real json > object that matches my schema - the error is the same. Any ideas what might > be going on here? > > Thanks, > -Greg > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Problems starting riak
Have you tried it with just a default Erlang build (just a ./configure)? On Sat, Jun 4, 2011 at 1:50 PM, Alvaro Videla wrote: > > Hi, > > @roidrage told me to use an older version of Erlang. Problem is another > library I want to use only compiles with latest Erlang. > > On Jun 4, 2011, at 10:48 PM, Jason J. W. Williams wrote: > > > Have you tried with Erlang R14B02? > > > > Sent via iPhone > > > > Is your email Premiere? > > > > On Jun 4, 2011, at 5:25, Alvaro Videla wrote: > > > >> Hi, > >> > >> I'm trying to build riak using the latest Erlang release built with > these options: > >> > >> ./configure --enable-smp-support --enable-darwin-64bit > --enable-kernel-poll > >> > >> I've got riak using: git clone git://github.com/basho/riak.git > >> > >> After I did *make rel* I tried bin/riak console > >> > >> And I got the following errors: > >> > >> The on_load function for module bitcask_nifs returned {error, > >> {bad_lib, > >> "Library version > (1.0) not compatible (with 2.2)."}} > >> > >> And: > >> > >> =INFO REPORT 4-Jun-2011::14:20:15 === > >> alarm_handler: {clear,{disk_almost_full,"/"}} > >> {"Kernel pid > terminated",application_controller,"{application_start_failure,riak_kv,{shutdown,{riak_kv_app,start,[normal,[]]}}}"} > >> > >> If I run df -h it shows that I have available 32GB on my HD. > >> > >> head erl_crash.dump > >> > >> =erl_crash_dump:0.1 > >> Sat Jun 4 14:20:16 2011 > >> Slogan: Kernel pid terminated (application_controller) > ({application_start_failure,riak_kv,{shutdown,{riak_kv_app,start,[normal,[]]}}}) > >> System version: Erlang R14B03 (erts-5.8.4) [source] [64-bit] [smp:2:2] > [rq:2] [async-threads:64] [hipe] [kernel-poll:true] > >> > >> Here's the full output: https://gist.github.com/1007857 > >> > >> Any help or hints? > >> > >> Cheers, > >> > >> Alvaro > >> ___ > >> riak-users mailing list > >> riak-users@lists.basho.com > >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > Sent form my Nokia 1100 > > > > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Riak Recap as a Blog (?)
+1 and I love the idea of a RSS feed also. On Sun, Jun 5, 2011 at 11:51 PM, Mark Phillips wrote: > Hey All - > > Quick question: how would you feel if we turned the Riak Recap into a blog? > > I've spoken with various people in various channels about how to best > deliver the Recap, and while it's clear that it's a valuable tool for > the community, I'm not sure the Mailing List is still the best vehicle > through which to publish it. > > Publishing it as a blog (perhaps at "recap.basho.com") makes a lot of > sense as it would enable people to consume it without having to sift > through the rest of the mailing list traffic (and I know there are > more than a few of you who are on this ML only for the Recaps). More > importantly, I think it would bring more new readers to the Recap (and > more users to Riak). > > So, in the interest of convenience and expanding the size of the Riak > community, I think making it a blog might make sense. It would still > be written, published, and tweeted thrice weekly, just delivered to > you in your Reader, for example, instead of on the ML. > > As you all are the primary consumers of the Recap, I thought I would > gather some opinions before I did anything drastic. Anyone have > thoughts on this? > > +/-1s, rants, and all other expressions of opinion are encouraged. > > Thanks, > > Mark > > Community Manager > Basho Technologies > wiki.basho.com > twitter.com/pharkmillups > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Has there been any talk of dropping the PB interface?
I'm curious if there has been any talk to drop the protocol buffers interface in favor of one of the more user-friendly serialization libraries which support more languages, like Bert (http://bert-rpc.org/) or MessagePack (http://msgpack.org/). I would think Bert is a perfect fit for Riak since it uses native Erlang binary which would make exposing the Erlang client pretty seamless. I'm not sure of the speed difference, but the fact that Google only provides PB support in three languages seems to me to be a bit of a hindrance. Thoughts? --Andrew ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Pros/Cons to not storing JSON
I am using Riak using the Erlang Client API (PB) and I was storing my documents as JSON and then converting them to records when I pull them out of Riak, but I got to thinking that maybe this isn't the greatest approach. I'm thinking that maybe it's better to store documents just as the record itself (Erlang binary) and then just converting the binary back to the record when I pull them from Riak. I was wondering what the pros/cons are to this approach. Here's my list so far: Pros: Native Erlang is stored, so less time to convert to the record Better support for nested records Smaller storage requirements and hence faster on the wire (?) Cons: Not readable through Rekon (or other utils) without modification Can't use standard M/R functions which analyze the document (have to write all custom functions using Erlang) Not portable across languages Thanks, Andrew ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Pros/Cons to not storing JSON
Ah, yes, you're right. Basically I'd have to either update all previous record docs with the new field or I'd have to have multiple record implementations to support the history of that particular record. That could be really, really ugly. Thanks Sean! On Thu, Jun 9, 2011 at 2:24 PM, Sean Cribbs wrote: > Andrew, > > I think you're on the right track here, but I might add that you'll want to > have upgrade paths available if you're using records -- that is, version > them -- so that you can evolve their structure over time. That could be a > little hairy unless done carefully. > > That said, you could use BERT as the serialization format, making > implementing JavaScript M/R functions a little easier, and interop with > other languages. > > Sean Cribbs > Developer Advocate > Basho Technologies, Inc. > http://basho.com/ > > On Jun 9, 2011, at 5:14 PM, Andrew Berman wrote: > > > I am using Riak using the Erlang Client API (PB) and I was storing my > documents as JSON and then converting them to records when I pull them out > of Riak, but I got to thinking that maybe this isn't the greatest approach. > I'm thinking that maybe it's better to store documents just as the record > itself (Erlang binary) and then just converting the binary back to the > record when I pull them from Riak. I was wondering what the pros/cons are > to this approach. Here's my list so far: > > > > Pros: > > > > Native Erlang is stored, so less time to convert to the record > > Better support for nested records > > Smaller storage requirements and hence faster on the wire (?) > > > > Cons: > > > > Not readable through Rekon (or other utils) without modification > > Can't use standard M/R functions which analyze the document (have to > write all custom functions using Erlang) > > Not portable across languages > > > > Thanks, > > > > Andrew > > ___ > > riak-users mailing list > > riak-users@lists.basho.com > > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Pros/Cons to not storing JSON
Cool, I've looked at BSON before for another project, and it might make sense in this case as well. Thanks! On Thu, Jun 9, 2011 at 2:26 PM, Will Moss wrote: > Hey Andrew, > > We're using BSON (bsonspec.org), because it stores binary (and other) data > types better than JSON and is also faster and more wire efficient (sounds > like about the same reasons you're considering leaving JSON). There are also > libraries to parse BSON it in just about every language. > > I haven't tried using it in a Erlang map-reduce yet (we don't do > map-reduces for any of our production work), but there is a library out > there so it shouldn't be too hard. > > Will > > > On Thu, Jun 9, 2011 at 2:24 PM, Sean Cribbs wrote: > >> Andrew, >> >> I think you're on the right track here, but I might add that you'll want >> to have upgrade paths available if you're using records -- that is, version >> them -- so that you can evolve their structure over time. That could be a >> little hairy unless done carefully. >> >> That said, you could use BERT as the serialization format, making >> implementing JavaScript M/R functions a little easier, and interop with >> other languages. >> >> Sean Cribbs >> Developer Advocate >> Basho Technologies, Inc. >> http://basho.com/ >> >> On Jun 9, 2011, at 5:14 PM, Andrew Berman wrote: >> >> > I am using Riak using the Erlang Client API (PB) and I was storing my >> documents as JSON and then converting them to records when I pull them out >> of Riak, but I got to thinking that maybe this isn't the greatest approach. >> I'm thinking that maybe it's better to store documents just as the record >> itself (Erlang binary) and then just converting the binary back to the >> record when I pull them from Riak. I was wondering what the pros/cons are >> to this approach. Here's my list so far: >> > >> > Pros: >> > >> > Native Erlang is stored, so less time to convert to the record >> > Better support for nested records >> > Smaller storage requirements and hence faster on the wire (?) >> > >> > Cons: >> > >> > Not readable through Rekon (or other utils) without modification >> > Can't use standard M/R functions which analyze the document (have to >> write all custom functions using Erlang) >> > Not portable across languages >> > >> > Thanks, >> > >> > Andrew >> > ___ >> > riak-users mailing list >> > riak-users@lists.basho.com >> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> >> >> ___ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> > > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Pros/Cons to not storing JSON
Well, I'd rather not do it that way and converting it to a string. But another thing I can do is convert the record to a proplist and then store that in the database. When I pull it out of the database, I would have to loop through the fields of the record definition, use each field as a key in the proplist to get the value out of the proplist. This would avoid the issue Sean raised with storing a record directly. On Thu, Jun 9, 2011 at 2:41 PM, Evans, Matthew wrote: > Hi, > > > > Why not convert your term to a string, and then you can do map reduce can’t > you? > > > > Term to a string… > > > > 1> Term = [{one,1},{two,2},{three,3}]. > > [{one,1},{two,2},{three,3}] > > 2> String = lists:flatten(io_lib:format("~p.", [Term])). > > "[{one,1},{two,2},{three,3}]." > > > > Save “String” in riak… > > > > Then back to a term… > > > > 3> String = "[{one,1},{two,2},{three,3}].". > > "[{one,1},{two,2},{three,3}]." > > 4> {ok,Tok,_} = erl_scan:string(String). > > 5> {ok,Term} = erl_parse:parse_term(Tok). > > {ok,[{one,1},{two,2},{three,3}]} > > > > /Matt > > > -- > > *From:* riak-users-boun...@lists.basho.com [mailto: > riak-users-boun...@lists.basho.com] *On Behalf Of *Will Moss > *Sent:* Thursday, June 09, 2011 5:27 PM > *To:* Sean Cribbs > *Cc:* riak-users > *Subject:* Re: Pros/Cons to not storing JSON > > > > Hey Andrew, > > > > We're using BSON (bsonspec.org), because it stores binary (and other) data > types better than JSON and is also faster and more wire efficient (sounds > like about the same reasons you're considering leaving JSON). There are also > libraries to parse BSON it in just about every language. > > > > I haven't tried using it in a Erlang map-reduce yet (we don't do > map-reduces for any of our production work), but there is a library out > there so it shouldn't be too hard. > > > > Will > > > > On Thu, Jun 9, 2011 at 2:24 PM, Sean Cribbs wrote: > > Andrew, > > I think you're on the right track here, but I might add that you'll want to > have upgrade paths available if you're using records -- that is, version > them -- so that you can evolve their structure over time. That could be a > little hairy unless done carefully. > > That said, you could use BERT as the serialization format, making > implementing JavaScript M/R functions a little easier, and interop with > other languages. > > Sean Cribbs > Developer Advocate > Basho Technologies, Inc. > http://basho.com/ > > > On Jun 9, 2011, at 5:14 PM, Andrew Berman wrote: > > > I am using Riak using the Erlang Client API (PB) and I was storing my > documents as JSON and then converting them to records when I pull them out > of Riak, but I got to thinking that maybe this isn't the greatest approach. > I'm thinking that maybe it's better to store documents just as the record > itself (Erlang binary) and then just converting the binary back to the > record when I pull them from Riak. I was wondering what the pros/cons are > to this approach. Here's my list so far: > > > > Pros: > > > > Native Erlang is stored, so less time to convert to the record > > Better support for nested records > > Smaller storage requirements and hence faster on the wire (?) > > > > Cons: > > > > Not readable through Rekon (or other utils) without modification > > Can't use standard M/R functions which analyze the document (have to > write all custom functions using Erlang) > > Not portable across languages > > > > Thanks, > > > > Andrew > > > ___ > > riak-users mailing list > > riak-users@lists.basho.com > > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Link Walking via Map Reduce
Hello, I'm having issues link walking using the Map Reduce link function. I am using HEAD from Git, so it's possible that's the issue, but here is what is happening. I've got two buckets, user and user_email where user_email contains a link to the user. When I run this: { "inputs": [ [ "user_email", "myem...@email.com" ] ], "query": [ { "link": { "bucket": "user", "tag": "user" } } ] } I only get [["user","LikiWUPJSFuxtrhCYpsPfg","user"]] returned. The second I add a map function, even the simplest one (function(v) { [v] } I get a "map_reduce error": { "inputs": [ [ "user_email", "myem...@email.com" ] ], "query": [ { "link": {"bucket":"user", "tag":"user"} } ,{ "map": { "language": "javascript", "source": "function(v) { return[v]; }" } } ] } Is this functionality broken? I am following what it says on the Wiki for the MapRed version of link walking. When I use HTTP link walking, it works correctly. Thanks, Andrew ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Link Walking via Map Reduce
Hey Ryan, Here is the error from the sasl log. It looks like some sort of encoding error. Any thoughts on how to fix this? I am storing the data as BERT encoded binary and I set the content-type as application/octet-stream. Thanks for your help! Andrew ERROR REPORT 9-Jun-2011::21:37:05 === ** Generic server <0.5996.21> terminating ** Last message in was {batch_dispatch, {map, {jsanon,<<"function(value) {return [value];}">>}, [{struct, [{<<"bucket">>,<<"user">>}, {<<"key">>,<<"LikiWUPJSFuxtrhCYpsPfg">>}, {<<"vclock">>, <<"a85hYGBgzGDKBVIsLKaZdzOYEhnzWBmes6Yd58sCAA==">>}, {<<"values">>, [{struct, [{<<"metadata">>, {struct, [{<<"X-Riak-VTag">>, <<"1KnL9Dlma9Yg4eMhRuhwtx">>}, {<<"X-Riak-Last-Modified">>, <<"Fri, 10 Jun 2011 03:05:11 GMT">>}]}}, {<<"data">>, <<131,108,0,0,0,18,104,2,100,0,6,114,...>>}]}]}]}, <<"user">>,none]}} ** When Server state == {state,<0.143.0>,riak_kv_js_map,#Port<0.92614>,true} ** Reason for termination == ** {function_clause,[{js_driver,eval_js, [#Port<0.92614>,{error,bad_encoding},5000]}, {riak_kv_js_vm,invoke_js,2}, {riak_kv_js_vm,define_invoke_anon_js,3}, {riak_kv_js_vm,handle_call,3}, {gen_server,handle_msg,5}, {proc_lib,init_p_do_apply,3}]} =CRASH REPORT 9-Jun-2011::21:37:05 === crasher: initial call: riak_kv_js_vm:init/1 pid: <0.5996.21> registered_name: [] exception exit: {function_clause,[{js_driver,eval_js,[#Port<0.92614>,{error,bad_encoding},5000]},{riak_kv_js_vm,invoke_js,2},{riak_kv_js_vm,define_invoke_anon_js,3},{riak_kv_js_vm,handle_call,3},{gen_server,handle_msg,5},{proc_lib,init_p_do_apply,3}]} in function gen_server:terminate/6 in call from proc_lib:init_p_do_apply/3 ancestors: [riak_kv_js_sup,riak_kv_sup,<0.128.0>] messages: [] links: [<0.142.0>,<0.6009.21>] dictionary: [] trap_exit: false status: running heap_size: 4181 stack_size: 24 reductions: 2586 neighbours: neighbour: [{pid,<0.6009.21>},{registered_name,[]},{initial_call,{riak_kv_mapper,init,[Argument__1]}},{current_function,{gen,do_call,4}},{ancestors,[riak_kv_mapper_sup,riak_kv_sup,<0.128.0>]},{messages,[]},{links,[<0.5996.21>,<12337.6227.21>,<0.162.0>]},{dictionary,[]},{trap_exit,false},{status,waiting},{heap_size,987},{stack_size,53},{reductions,1043}] =SUPERVISOR REPORT 9-Jun-2011::21:37:05 === Supervisor: {local,riak_kv_js_sup} Context: child_terminated Reason: {function_clause,[{js_driver,eval_js,[#Port<0.92614>,{error,bad_encoding},5000]},{riak_kv_js_vm,invoke_js,2},{riak_kv_js_vm,define_invoke_anon_js,3},{riak_kv_js_vm,handle_call,3},{gen_server,handle_msg,5},{proc_lib,init_p_do_apply,3}]} Offender: [{pid,<0.5996.21>},{name,undefined},{mfargs,{riak_kv_js_vm,start_link,undefined}},{restart_type,temporary},{shutdown,2000},{child_type,worker}] On Wed, Jun 22, 2011 at 6:10 PM, Ryan Zezeski wrote: > > Andrew, > Maybe you could elaborate on the error? I tested this against master (commit > below) just now with success. > 2b1a474f836d962fa035f48c05452e22fc6c2193 Change dependency to allow for > R14B03 as well as R14B02 > -Ryan > On Wed, Jun 22, 2011 at 7:03 PM, Andrew Berman wrote: >> >> Hello, >> I'm having issues link walking using the Map Reduce link function. I am >> using HEAD from Git, so it's possible that's the issue, but here is what is >> happening. >> I've got two buckets, user and user_email where user_email contains a link >> to the user. >> When I run this: >> { >> "inputs": [ >> [ >> "user_email", >> "myem...@email.com" >> ] >> ], >> "query": [ >> { >> "link": { >> "bucket": "user", >> "tag&q
Re: Link Walking via Map Reduce
Mathias, I thought Riak was content agnostic when it came to the data being stored? The map phase is not running Riak.mapValuesJson, so why is the data itself going through the JSON parser? The JSON value returned by v with all the info is valid and I see the struct atom in there so mochijson2 can parse it properly, but I'm not clear why mochijson2 would be coughing at the data part. --Andrew On Thu, Jun 23, 2011 at 5:32 AM, Mathias Meyer wrote: > Andrew, > > you're indeed hitting a JSON encoding problem here. BERT is binary data, and > won't make the JSON parser happy when trying to deserialize it, before > handing it into the map phase. You have two options here, and none of them > will involve JavaScript as the MapReduce language. > > 1.) Use the Protobuff API, use Erlang functions to return the value or object > (e.g. riak_mapreduce:map_object_value or riak_kv_mapreduce:map_identity), and > then run MapReduce queries with the content type > 'application/x-erlang-binary'. However, you're constrained by client > libraries here, e.g. Ruby and Python don't support this content type for > MapReduce on the Protobuffs interface yet, so you'd either implement > something custom, or resort to a client that does, riak-erlang-client comes > to mind, though it was proven to be possible using the Java client too, > thanks to Russell. See [1] and [2] > > 2.) Convert the result from BERT into a JSON-parseable structure inside an > Erlang map function, before it's returned to the client. > > The second approach certainly is less restrictive in terms of API usage, but > certainly involves some overhead inside of the MapReduce request itself, but > is less prone to encoding/decoding issues with JSON. > > Mathias Meyer > Developer Advocate, Basho Technologies > > [1] > http://lists.basho.com/pipermail/riak-users_lists.basho.com/2011-June/004447.html > [2] > http://lists.basho.com/pipermail/riak-users_lists.basho.com/2011-June/004485.html > > On Donnerstag, 23. Juni 2011 at 07:59, Andrew Berman wrote: > >> Hey Ryan, >> >> Here is the error from the sasl log. It looks like some sort of >> encoding error. Any thoughts on how to fix this? I am storing the >> data as BERT encoded binary and I set the content-type as >> application/octet-stream. >> >> Thanks for your help! >> >> Andrew >> >> ERROR REPORT 9-Jun-2011::21:37:05 === >> ** Generic server <0.5996.21> terminating >> ** Last message in was {batch_dispatch, >> {map, >> {jsanon,<<"function(value) {return [value];}">>}, >> [{struct, >> [{<<"bucket">>,<<"user">>}, >> {<<"key">>,<<"LikiWUPJSFuxtrhCYpsPfg">>}, >> {<<"vclock">>, >> >> <<"a85hYGBgzGDKBVIsLKaZdzOYEhnzWBmes6Yd58sCAA==">>}, >> {<<"values">>, >> [{struct, >> [{<<"metadata">>, >> {struct, >> [{<<"X-Riak-VTag">>, >> <<"1KnL9Dlma9Yg4eMhRuhwtx">>}, >> {<<"X-Riak-Last-Modified">>, >> <<"Fri, 10 Jun 2011 03:05:11 GMT">>}]}}, >> {<<"data">>, >> >> <<131,108,0,0,0,18,104,2,100,0,6,114,...>>}]}]}]}, >> <<"user">>,none]}} >> ** When Server state == {state,<0.143.0>,riak_kv_js_map,#Port<0.92614>,true} >> ** Reason for termination == >> ** {function_clause,[{js_driver,eval_js, >> [#Port<0.92614>,{error,bad_encoding},5000]}, >> {riak_kv_js_vm,invoke_js,2}, >> {riak_kv_js_vm,define_invoke_anon_js,3}, >> {riak_kv_js_vm,handle_call,3}, >> {gen_server,handle_msg,5}, >> {proc_lib,init_p_do_apply,3}]} >> >> =CRASH REPORT 9-Jun-2011::21:37:05 === >> crasher: >> initial call: riak_kv_js_vm:init/1 >> pid: <0.5996.21> >> registered_name: [] >> exception exit: >> {function_clause,[{js_driver,eval_js,[#Port<0.92614>,{error,bad_encoding},5000]},{riak_kv_js_vm,invoke_js,2},{riak_kv_js_vm,define_invoke_anon_js,3},{riak_kv_js_vm,handle_call,3},{gen_server,handle_msg,5},{proc_lib,init_p_do_apply,3}]} >> in function gen_server:terminate/6 >> in call from proc_lib:init_p_do_apply/3 >> ancestors: [riak_kv_js_sup,riak_kv_sup,<0.128.0>] >> messages: [] >> links: [<0.142.0>,<0.6009.21>] >> dictionary: [] >> trap_exit: false >> status: running >> heap_size: 4181 >> stack_size: 24 >> reducti
Re: Link Walking via Map Reduce
But isn't the value itself JSON? Meaning this part: {struct, [{<<"bucket">>,<<"user">>}, {<<"key">>,<<"LikiWUPJSFuxtrhCYpsPfg">>}, {<<"vclock">>, <<"a85hYGBgzGDKBVIsLKaZdzOYEhnzWBmes6Yd58sCAA==">>}, {<<"values">>, [{struct, [{<<"metadata">>, {struct, [{<<"X-Riak-VTag">>, <<"1KnL9Dlma9Yg4eMhRuhwtx">>}, {<<"X-Riak-Last-Modified">>, <<"Fri, 10 Jun 2011 03:05:11 GMT">>}]}}, {<<"data">>, <<131,108,0,0,0,18,104,2,100,0,6,114,...>>}]}]} So the only thing that is not JSON is the data itself, but when I get the value, shouldn't I be getting the all the info above which is JSON encoded? Thank you all for your help, Andrew On Thu, Jun 23, 2011 at 8:17 AM, Sean Cribbs wrote: > The object has to be JSON-encoded to be marshalled into the Javascript VM, > and also on the way out if the Accept header indicates application/json. So > you have two places where it needs to be encodable into JSON. > On Thu, Jun 23, 2011 at 11:14 AM, Andrew Berman wrote: >> >> Mathias, >> >> I thought Riak was content agnostic when it came to the data being >> stored? The map phase is not running Riak.mapValuesJson, so why is >> the data itself going through the JSON parser? The JSON value >> returned by v with all the info is valid and I see the struct atom in >> there so mochijson2 can parse it properly, but I'm not clear why >> mochijson2 would be coughing at the data part. >> >> --Andrew >> >> On Thu, Jun 23, 2011 at 5:32 AM, Mathias Meyer wrote: >> > Andrew, >> > >> > you're indeed hitting a JSON encoding problem here. BERT is binary data, >> > and won't make the JSON parser happy when trying to deserialize it, before >> > handing it into the map phase. You have two options here, and none of them >> > will involve JavaScript as the MapReduce language. >> > >> > 1.) Use the Protobuff API, use Erlang functions to return the value or >> > object (e.g. riak_mapreduce:map_object_value or >> > riak_kv_mapreduce:map_identity), and then run MapReduce queries with the >> > content type 'application/x-erlang-binary'. However, you're constrained by >> > client libraries here, e.g. Ruby and Python don't support this content type >> > for MapReduce on the Protobuffs interface yet, so you'd either implement >> > something custom, or resort to a client that does, riak-erlang-client comes >> > to mind, though it was proven to be possible using the Java client too, >> > thanks to Russell. See [1] and [2] >> > >> > 2.) Convert the result from BERT into a JSON-parseable structure inside >> > an Erlang map function, before it's returned to the client. >> > >> > The second approach certainly is less restrictive in terms of API usage, >> > but certainly involves some overhead inside of the MapReduce request >> > itself, >> > but is less prone to encoding/decoding issues with JSON. >> > >> > Mathias Meyer >> > Developer Advocate, Basho Technologies >> > >> > [1] >> > http://lists.basho.com/pipermail/riak-users_lists.basho.com/2011-June/004447.html >> > [2] >> > http://lists.basho.com/pipermail/riak-users_lists.basho.com/2011-June/004485.html >> > >> > On Donnerstag, 23. Juni 2011 at 07:59, Andrew Berman wrote: >> > >> >> Hey Ryan, >> >> >> >> Here is the error from the sasl log. It looks like some sort of >> >> encoding error. Any thoughts on how to fix this? I am storing the >> >> data as BERT encoded binary and I set the content-type as >> >> application/octet-stream. >> >> >> >> Thanks for your help! >> >> >> >> Andrew >> >> >> >> ERROR REPORT 9-Jun-2011::21:37:05 === >> >> ** Generic server <0.5996.21> terminating >> >> ** Last message in was {batch_dispatch, >> >> {map, >> >> {jsanon,<<"function(value) {return [
Re: Link Walking via Map Reduce
Ah, ok, that makes sense. One more question, when I use the HTTP link walking, I do get the data back as expected, so is there a way to replicate this in a Map-Reduce job or using the Erlang PBC (which I forgot to mention is what I'm using and the reason I'm not using the HTTP link walking method)? --Andrew On Thu, Jun 23, 2011 at 8:39 AM, Mathias Meyer wrote: > Andrew, > > the data looks like JSON, but it's not valid JSON. Have a look at the list > that's in the data section (which is your BERT encoded data), the first > character in that list is 131, which is not a valid UTF-8 character, and JSON > only allows valid UTF-8 characters. With a binary-encoded format, there's > always a chance for a control character like that to blow up the JSON > generated before and after the MapReduce code is executed. With JSON, content > agnosticism only goes as far as the set of legal characters allows. > > On a side note, if the data were a valid representation of a string, you > would see it as a string in the log file as well, not just as a list of > numbers. > > Mathias Meyer > Developer Advocate, Basho Technologies > > > On Donnerstag, 23. Juni 2011 at 17:31, Andrew Berman wrote: > >> But isn't the value itself JSON? Meaning this part: >> >> {struct, >> [{<<"bucket">>,<<"user">>}, >> {<<"key">>,<<"LikiWUPJSFuxtrhCYpsPfg">>}, >> {<<"vclock">>, >> >> <<"a85hYGBgzGDKBVIsLKaZdzOYEhnzWBmes6Yd58sCAA==">>}, >> {<<"values">>, >> [{struct, >> [{<<"metadata">>, >> {struct, >> [{<<"X-Riak-VTag">>, >> <<"1KnL9Dlma9Yg4eMhRuhwtx">>}, >> {<<"X-Riak-Last-Modified">>, >> <<"Fri, 10 Jun 2011 03:05:11 GMT">>}]}}, >> {<<"data">>, >> >> <<131,108,0,0,0,18,104,2,100,0,6,114,...>>}]}]} >> >> So the only thing that is not JSON is the data itself, but when I get >> the value, shouldn't I be getting the all the info above which is JSON >> encoded? >> >> Thank you all for your help, >> >> Andrew >> >> On Thu, Jun 23, 2011 at 8:17 AM, Sean Cribbs > (mailto:s...@basho.com)> wrote: >> > The object has to be JSON-encoded to be marshalled into the Javascript VM, >> > and also on the way out if the Accept header indicates application/json. So >> > you have two places where it needs to be encodable into JSON. >> > On Thu, Jun 23, 2011 at 11:14 AM, Andrew Berman > > (mailto:rexx...@gmail.com)> wrote: >> > > >> > > Mathias, >> > > >> > > I thought Riak was content agnostic when it came to the data being >> > > stored? The map phase is not running Riak.mapValuesJson, so why is >> > > the data itself going through the JSON parser? The JSON value >> > > returned by v with all the info is valid and I see the struct atom in >> > > there so mochijson2 can parse it properly, but I'm not clear why >> > > mochijson2 would be coughing at the data part. >> > > >> > > --Andrew >> > > >> > > On Thu, Jun 23, 2011 at 5:32 AM, Mathias Meyer > > > (mailto:math...@basho.com)> wrote: >> > > > Andrew, >> > > > >> > > > you're indeed hitting a JSON encoding problem here. BERT is binary >> > > > data, >> > > > and won't make the JSON parser happy when trying to deserialize it, >> > > > before >> > > > handing it into the map phase. You have two options here, and none of >> > > > them >> > > > will involve JavaScript as the MapReduce language. >> > > > >> > > > 1.) Use the Protobuff API, use Erlang functions to return the value or >> > > > object (e.g. riak_mapreduce:map_object_value or >> > > > riak_kv_mapreduce:map_identity), and then run MapReduce queries with >> > > > the >> > > > content type 'application/x-erlang-binary'. However, you're >> > > > constrained by >> > > > client libraries here, e.g. Ruby and Python don't support this content >> > > > type >> > > > for MapReduce on the Protobuffs interface yet, so you'd either >> > > > implement >> > > > something custom, or resort to a client that does, riak-erl
Re: Link Walking via Map Reduce
Yes, I am able to do that, but I feel like this completely defeats the purpose of a link by having to do two different calls. I might as well just store the user id in the data for user_email instead and not use a link at all with your method. What advantage does a link offer at that point? On Thu, Jun 23, 2011 at 8:55 AM, Jeremiah Peschka wrote: > HTTP link walking will get you back the data in the way that you'd expect. > It's a two-step process using PBC. MR link phases will give you a list of > [bucket, key, tag] that you can then use to pull back the records from > Riak. > --- > Jeremiah Peschka > Founder, Brent Ozar PLF, LLC > > On Thursday, June 23, 2011 at 8:52 AM, Andrew Berman wrote: > > Ah, ok, that makes sense. One more question, when I use the HTTP link > walking, I do get the data back as expected, so is there a way to > replicate this in a Map-Reduce job or using the Erlang PBC (which I > forgot to mention is what I'm using and the reason I'm not using the > HTTP link walking method)? > > --Andrew > > On Thu, Jun 23, 2011 at 8:39 AM, Mathias Meyer wrote: > > Andrew, > > the data looks like JSON, but it's not valid JSON. Have a look at the list > that's in the data section (which is your BERT encoded data), the first > character in that list is 131, which is not a valid UTF-8 character, and > JSON only allows valid UTF-8 characters. With a binary-encoded format, > there's always a chance for a control character like that to blow up the > JSON generated before and after the MapReduce code is executed. With JSON, > content agnosticism only goes as far as the set of legal characters allows. > > On a side note, if the data were a valid representation of a string, you > would see it as a string in the log file as well, not just as a list of > numbers. > > Mathias Meyer > Developer Advocate, Basho Technologies > > > On Donnerstag, 23. Juni 2011 at 17:31, Andrew Berman wrote: > > But isn't the value itself JSON? Meaning this part: > > {struct, > [{<<"bucket">>,<<"user">>}, > {<<"key">>,<<"LikiWUPJSFuxtrhCYpsPfg">>}, > {<<"vclock">>, > > <<"a85hYGBgzGDKBVIsLKaZdzOYEhnzWBmes6Yd58sCAA==">>}, > {<<"values">>, > [{struct, > [{<<"metadata">>, > {struct, > [{<<"X-Riak-VTag">>, > <<"1KnL9Dlma9Yg4eMhRuhwtx">>}, > {<<"X-Riak-Last-Modified">>, > <<"Fri, 10 Jun 2011 03:05:11 GMT">>}]}}, > {<<"data">>, > > <<131,108,0,0,0,18,104,2,100,0,6,114,...>>}]}]} > > So the only thing that is not JSON is the data itself, but when I get > the value, shouldn't I be getting the all the info above which is JSON > encoded? > > Thank you all for your help, > > Andrew > > On Thu, Jun 23, 2011 at 8:17 AM, Sean Cribbs (mailto:s...@basho.com)> wrote: > > The object has to be JSON-encoded to be marshalled into the Javascript VM, > and also on the way out if the Accept header indicates application/json. So > you have two places where it needs to be encodable into JSON. > On Thu, Jun 23, 2011 at 11:14 AM, Andrew Berman (mailto:rexx...@gmail.com)> wrote: > > Mathias, > > I thought Riak was content agnostic when it came to the data being > stored? The map phase is not running Riak.mapValuesJson, so why is > the data itself going through the JSON parser? The JSON value > returned by v with all the info is valid and I see the struct atom in > there so mochijson2 can parse it properly, but I'm not clear why > mochijson2 would be coughing at the data part. > > --Andrew > > On Thu, Jun 23, 2011 at 5:32 AM, Mathias Meyer (mailto:math...@basho.com)> wrote: > > Andrew, > > you're indeed hitting a JSON encoding problem here. BERT is binary data, > and won't make the JSON parser happy when trying to deserialize it, before > handing it into the map phase. You have two options here, and none of them > will involve JavaScript as the MapReduce language. > > 1.) Use the Protobuff API, use Erlang functions to return the value or > object (e.g. riak_mapreduce:map_object_value or > riak_kv_mapreduce:map_identity), and then run MapReduce queries with the > content type 'application/x-erlang-binary'. However, you're constrained by > client libraries here, e.g. Ruby and Python don't support this content type > for MapReduce on the Protobuffs interface yet, so you'd either implement > something custom, or resort to a client that does, riak-erlang-client
Re: Link Walking via Map Reduce
Mathias, I took the BERT encoding and then encoded that as Base64 which should pass the test of valid UTF-8 characters. However, now I'm starting to think that maybe doing two encodings and storing that for the purpose of saving space is not worth the trade-off in performance vs just storing the data in JSON format. Do you guys have any thoughts on this? Thanks, Andrew On Thu, Jun 23, 2011 at 8:39 AM, Mathias Meyer wrote: > Andrew, > > the data looks like JSON, but it's not valid JSON. Have a look at the list > that's in the data section (which is your BERT encoded data), the first > character in that list is 131, which is not a valid UTF-8 character, and JSON > only allows valid UTF-8 characters. With a binary-encoded format, there's > always a chance for a control character like that to blow up the JSON > generated before and after the MapReduce code is executed. With JSON, content > agnosticism only goes as far as the set of legal characters allows. > > On a side note, if the data were a valid representation of a string, you > would see it as a string in the log file as well, not just as a list of > numbers. > > Mathias Meyer > Developer Advocate, Basho Technologies > > > On Donnerstag, 23. Juni 2011 at 17:31, Andrew Berman wrote: > >> But isn't the value itself JSON? Meaning this part: >> >> {struct, >> [{<<"bucket">>,<<"user">>}, >> {<<"key">>,<<"LikiWUPJSFuxtrhCYpsPfg">>}, >> {<<"vclock">>, >> >> <<"a85hYGBgzGDKBVIsLKaZdzOYEhnzWBmes6Yd58sCAA==">>}, >> {<<"values">>, >> [{struct, >> [{<<"metadata">>, >> {struct, >> [{<<"X-Riak-VTag">>, >> <<"1KnL9Dlma9Yg4eMhRuhwtx">>}, >> {<<"X-Riak-Last-Modified">>, >> <<"Fri, 10 Jun 2011 03:05:11 GMT">>}]}}, >> {<<"data">>, >> >> <<131,108,0,0,0,18,104,2,100,0,6,114,...>>}]}]} >> >> So the only thing that is not JSON is the data itself, but when I get >> the value, shouldn't I be getting the all the info above which is JSON >> encoded? >> >> Thank you all for your help, >> >> Andrew >> >> On Thu, Jun 23, 2011 at 8:17 AM, Sean Cribbs > (mailto:s...@basho.com)> wrote: >> > The object has to be JSON-encoded to be marshalled into the Javascript VM, >> > and also on the way out if the Accept header indicates application/json. So >> > you have two places where it needs to be encodable into JSON. >> > On Thu, Jun 23, 2011 at 11:14 AM, Andrew Berman > > (mailto:rexx...@gmail.com)> wrote: >> > > >> > > Mathias, >> > > >> > > I thought Riak was content agnostic when it came to the data being >> > > stored? The map phase is not running Riak.mapValuesJson, so why is >> > > the data itself going through the JSON parser? The JSON value >> > > returned by v with all the info is valid and I see the struct atom in >> > > there so mochijson2 can parse it properly, but I'm not clear why >> > > mochijson2 would be coughing at the data part. >> > > >> > > --Andrew >> > > >> > > On Thu, Jun 23, 2011 at 5:32 AM, Mathias Meyer > > > (mailto:math...@basho.com)> wrote: >> > > > Andrew, >> > > > >> > > > you're indeed hitting a JSON encoding problem here. BERT is binary >> > > > data, >> > > > and won't make the JSON parser happy when trying to deserialize it, >> > > > before >> > > > handing it into the map phase. You have two options here, and none of >> > > > them >> > > > will involve JavaScript as the MapReduce language. >> > > > >> > > > 1.) Use the Protobuff API, use Erlang functions to return the value or >> > > > object (e.g. riak_mapreduce:map_object_value or >> > > > riak_kv_mapreduce:map_identity), and then run MapReduce queries with >> > > > the >> > > > content type 'application/x-erlang-binary'. However, you're >> > > > constrained by >> > > > client libraries here, e.g. Ruby and Python don't support this content >> > > > type >> > > > for MapReduce on the Protobuffs interface yet, so you'd either >> > > > implement >> &g
Re: Link Walking via Map Reduce
And related, does Bitcask have any sort of compression built into it? On Fri, Jun 24, 2011 at 2:58 PM, Andrew Berman wrote: > Mathias, > > I took the BERT encoding and then encoded that as Base64 which should > pass the test of valid UTF-8 characters. However, now I'm starting to > think that maybe doing two encodings and storing that for the purpose > of saving space is not worth the trade-off in performance vs just > storing the data in JSON format. Do you guys have any thoughts on > this? > > Thanks, > > Andrew > > On Thu, Jun 23, 2011 at 8:39 AM, Mathias Meyer wrote: >> Andrew, >> >> the data looks like JSON, but it's not valid JSON. Have a look at the list >> that's in the data section (which is your BERT encoded data), the first >> character in that list is 131, which is not a valid UTF-8 character, and >> JSON only allows valid UTF-8 characters. With a binary-encoded format, >> there's always a chance for a control character like that to blow up the >> JSON generated before and after the MapReduce code is executed. With JSON, >> content agnosticism only goes as far as the set of legal characters allows. >> >> On a side note, if the data were a valid representation of a string, you >> would see it as a string in the log file as well, not just as a list of >> numbers. >> >> Mathias Meyer >> Developer Advocate, Basho Technologies >> >> >> On Donnerstag, 23. Juni 2011 at 17:31, Andrew Berman wrote: >> >>> But isn't the value itself JSON? Meaning this part: >>> >>> {struct, >>> [{<<"bucket">>,<<"user">>}, >>> {<<"key">>,<<"LikiWUPJSFuxtrhCYpsPfg">>}, >>> {<<"vclock">>, >>> >>> <<"a85hYGBgzGDKBVIsLKaZdzOYEhnzWBmes6Yd58sCAA==">>}, >>> {<<"values">>, >>> [{struct, >>> [{<<"metadata">>, >>> {struct, >>> [{<<"X-Riak-VTag">>, >>> <<"1KnL9Dlma9Yg4eMhRuhwtx">>}, >>> {<<"X-Riak-Last-Modified">>, >>> <<"Fri, 10 Jun 2011 03:05:11 GMT">>}]}}, >>> {<<"data">>, >>> >>> <<131,108,0,0,0,18,104,2,100,0,6,114,...>>}]}]} >>> >>> So the only thing that is not JSON is the data itself, but when I get >>> the value, shouldn't I be getting the all the info above which is JSON >>> encoded? >>> >>> Thank you all for your help, >>> >>> Andrew >>> >>> On Thu, Jun 23, 2011 at 8:17 AM, Sean Cribbs >> (mailto:s...@basho.com)> wrote: >>> > The object has to be JSON-encoded to be marshalled into the Javascript VM, >>> > and also on the way out if the Accept header indicates application/json. >>> > So >>> > you have two places where it needs to be encodable into JSON. >>> > On Thu, Jun 23, 2011 at 11:14 AM, Andrew Berman >> > (mailto:rexx...@gmail.com)> wrote: >>> > > >>> > > Mathias, >>> > > >>> > > I thought Riak was content agnostic when it came to the data being >>> > > stored? The map phase is not running Riak.mapValuesJson, so why is >>> > > the data itself going through the JSON parser? The JSON value >>> > > returned by v with all the info is valid and I see the struct atom in >>> > > there so mochijson2 can parse it properly, but I'm not clear why >>> > > mochijson2 would be coughing at the data part. >>> > > >>> > > --Andrew >>> > > >>> > > On Thu, Jun 23, 2011 at 5:32 AM, Mathias Meyer >> > > (mailto:math...@basho.com)> wrote: >>> > > > Andrew, >>> > > > >>> > > > you're indeed hitting a JSON encoding problem here. BERT is binary >>> > > > data, >>> > > > and won't make the JSON parser happy when trying to deserialize it, >>> > > > before >>> > > > handing it into the map phase. You have two options here, and none of >>> > > > them >>> > > > will involve JavaScript as the MapReduce language. >>> > > > >>> > > > 1.) Use the Protobuff API, use Erlang functions to return the value or >>> > > > object (e.g. riak_mapreduce:map_object_value or >>>
Re: Link Walking via Map Reduce
Has there been any talk of using compression, maybe something like Snappy (http://code.google.com/p/snappy/) since it's fast and shouldn't affect performance too much? On Fri, Jun 24, 2011 at 3:29 PM, Aphyr wrote: > Nope. > > On 06/24/2011 03:24 PM, Andrew Berman wrote: >> >> And related, does Bitcask have any sort of compression built into it? > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Controlling order of results from link phase
I've noticed that when I run the link function, it automatically orders the links based on Id. Is there a way to tell it not to sort the links? In other words, I want the links in the order in which they were put in the list (most recent at the head of the list) and I see from Rekon that that is how they are being stored. Basically, I'm running a reduce_slice on the result of the link phase so that Riak doesn't load up all the objects with which the links are pointing. If the answer is no, I cannot control the order of the links, is the only option to prepend something like the time (e.g. millis since 1-1-1970)? Thanks, Andrew ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Controlling order of results from link phase
I think I found the answer and it is no, I cannot control the sort. I found the code here in riak_kv_wm_link_walker.erl: links(Object) -> MDs = riak_object:get_metadatas(Object), lists:umerge( [ case dict:find(?MD_LINKS, MD) of {ok, L} -> [ [B,K,T] || {{B,K},T} <- lists:sort(L) ]; error -> [] end || MD <- MDs ]). Why run lists:sort? Shouldn't the sort be up to the user after he gets the actual object? I don't understand why the sort processing is necessary at the link phase. Thoughts? --Andrew On Sun, Jun 26, 2011 at 3:15 PM, Andrew Berman wrote: > I've noticed that when I run the link function, it automatically > orders the links based on Id. Is there a way to tell it not to sort > the links? In other words, I want the links in the order in which > they were put in the list (most recent at the head of the list) and I > see from Rekon that that is how they are being stored. Basically, I'm > running a reduce_slice on the result of the link phase so that Riak > doesn't load up all the objects with which the links are pointing. If > the answer is no, I cannot control the order of the links, is the only > option to prepend something like the time (e.g. millis since > 1-1-1970)? > > Thanks, > > Andrew > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Controlling order of results from link phase
Nevermind on the lists:sort issue below as I realized that list:umerge requires that the lists be sorted. Thanks anyway, Andrew On Sun, Jun 26, 2011 at 3:36 PM, Andrew Berman wrote: > I think I found the answer and it is no, I cannot control the sort. I > found the code here in riak_kv_wm_link_walker.erl: > > links(Object) -> > MDs = riak_object:get_metadatas(Object), > lists:umerge( > [ case dict:find(?MD_LINKS, MD) of > {ok, L} -> > [ [B,K,T] || {{B,K},T} <- lists:sort(L) ]; > error -> [] > end > || MD <- MDs ]). > > Why run lists:sort? Shouldn't the sort be up to the user after he > gets the actual object? I don't understand why the sort processing is > necessary at the link phase. Thoughts? > > --Andrew > > On Sun, Jun 26, 2011 at 3:15 PM, Andrew Berman wrote: >> I've noticed that when I run the link function, it automatically >> orders the links based on Id. Is there a way to tell it not to sort >> the links? In other words, I want the links in the order in which >> they were put in the list (most recent at the head of the list) and I >> see from Rekon that that is how they are being stored. Basically, I'm >> running a reduce_slice on the result of the link phase so that Riak >> doesn't load up all the objects with which the links are pointing. If >> the answer is no, I cannot control the order of the links, is the only >> option to prepend something like the time (e.g. millis since >> 1-1-1970)? >> >> Thanks, >> >> Andrew >> > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: How to setup Post-Commit Hooks
They should exist on each node (server-side). You set up the code path using http://wiki.basho.com/Configuration-Files.html#add_paths or -pz in vm.args for each node. Once you do that you (assuming Erlang), you would just compile the file and put the beams in the directory you stated in the path and then start each node and you should be good to go. If using JavaScript, you want to set http://wiki.basho.com/Configuration-Files.html#js_source_dir which should be pretty self-explanatory. --Andrew On Tue, Jun 28, 2011 at 10:38 AM, Charles Daniel wrote: > I can't figure out how post-hooks are to be setup in Riak. I was wondering if > I could get an example of where to set it up (is it via the client or is it > on Riak's server side?). I've read this in the wiki already > http://wiki.basho.com/Pre--and-Post-Commit-Hooks.html#Post-Commit-Hooks > but it doesn't exactly go into much detail of how/where to set it up. > > Thanks in advance > -Chuck > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: New Java Client API
So much better than the old API!! Thanks for the hard work on this. On Tue, Jun 28, 2011 at 10:53 AM, Russell Brown wrote: > Hi, > A while back (in March) I asked for help/feedback in defining a new API for > the riak-java-client[1]. Last week I merged a branch to master of the basho > riak-java-client repo[2] with that new API. > Defining an API is hard. I like Joshua Bloch's advice: > "You can't please everyone so aim to displease everyone equally." > Well, I didn't *aim* to displease anyone, but if you don't like fluent APIs > and builders, you're not going to like this ;) > I had two aims with this re-work > 1. A common API for the different transports (HTTP/PB) > 2. A set of strategies for dealing with fault tolerant, eventually > consistent databases at the client layer > The bulk of the inspiration for the latter aim came from this talk[3] by > Coda Hale and Ryan Kennedy of Yammer (and some follow up emails) as well as > from emails and advice on this list (from Kresten Krab and Jon Brisbin > (amongst others.)) The team at Basho have been brilliant and very patient > answering all my questions about the workings of Riak and the best way for > the client to behave in a given situation. And Mathias Meyer is the best > remote/IM rubber duck[4] in the world. > That said, the implementation (and mistakes) are mainly mine. Please have a > look at the new README, pull the code and play with it. There are some rough > edges, and I have a long TODO list of tidying and extra features, so if you > find a bug, need a feature or have any feedback at all please get in touch, > or even fork the project and send a pull request ;) > Cheers > Russell > [1] > - http://lists.basho.com/pipermail/riak-users_lists.basho.com/2011-March/003599.html > [2] - https://github.com/basho/riak-java-client > [3] - http://blog.basho.com/2011/03/28/Riak-and-Scala-at-Yammer/ (if you > haven't watched this yet, please do.) > [4] - http://c2.com/cgi/wiki?RubberDucking > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Riak or Cassandra for this...
It seems to me that you need to put weights on your requirements because I think it's going to be pretty tough to meet all of them with just one solution. For example, you can use something like Redis to do fast writes, but it doesn't have Map-Reduce queries. So, you can use Redis to write the data and then you can have another program which moves (look into Redis's awesome Pub/Sub features) the data from Redis to Riak or Hadoop where you can then perform your Map-Reduce query. Just my two cents. --Andrew On Tue, Jun 28, 2011 at 8:17 AM, Evans, Matthew wrote: > Hi, > > I've been looking at a number of technologies for a simple application. > > We are saving large amounts of data to disc; this data is event-log/sensor > data which may look something like: > > Version, Account, RequestID, Timestamp, Duration, IPAddr, Method, URL, HTTP > Version, Response_Code, Size, Hit_Rate, Range_From, Range_To, Referrer, > Agent, Content_Type, Accept_Encoding, Redirect_Code, Progress > > > For Example: > > 1 agora 270509382712866522850368375 1289589216.893 1989.938 > 79.7.41.29 GET http://bi.sciagnij.pl/0/4/TWEE_Upgrade.exe HTTP/1.1 200 > 953772216 725098308 713834308 -1 -1 - > Mozilla/4.0(compatible;MSIE6.0;WindowsNT5.1) application/octet-stream gzip - > 0 progress > > The data has no specific key to index off (we will be doing some parsing of > the data on ingest to get basic information allowing for fast queries, but > this is outside of Riak). > > Really the issue is that we need to be able to apply "analytic" (map-reduce) > type queries on the data. These queries do not need to be real-time, but > should not take days to run. > > For example: All GET requests for a specific URL within a specific time range. > > The amount of data saved could be quite large (forcing us to use InnoDB > instead of BitCask). One estimate is ~1 billion records. Architecturally this > data could be split over multiple nodes. > > The choice of client-side language is still open, with Erlang as the current > favorite. As I see it the advantages of Riak are: > > 1) HTTP based API as well as Erlang and other client APIs (the system has a > mix of programming languages including Python and C/C++). > > 2) More flexible/extensible data model (Cassandra requires you to predefine > the key spaces, columns etc ahead of time) > > 3) Easier to install/setup without the apparent bloat and complexity of > Cassandra (which also includes Java setup) > > 4) Map-reduce queries > > The disadvantages of Riak are: > > 1) Write performance. We need to handle ~50,000 writes per second. > > I would recommend running our client app from within the same Erlang VM as > Riak so hopefully we can gain something here. Alternatively use innostore > Erlang API directly for writes. > > Questions: > > 1) Is Riak a good database for this application? > > 2) Can we write to InnoDB directly and still leverage the map-reduce queries > on the data? > > Regards > > Matt > > > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
When is Vclock generated?
I'm looking to store the vclock on my object to be used for versioning. Currently, when I get the object from Riak I fill in the version with the vclock from Riak (which I use to determine if the object is persistent and for passing back to Riak when putting) and then when the object is saved it saves the version as the previous vclock value. I'm wondering when the vclock is actually generated. Can I write a pre-commit hook that fills in the version so it has the most updated value or is there no way for me to do it. It's not a huge deal because the version value in the db is immediately updated upon loading the data from Riak, but I just feel like it would make things more consistent if I could have the version matching the updated vclock. Thanks for any help! Andrew ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: When is Vclock generated?
Sean, Thanks for the reply. I am fetching before write, however, what I want is the ability to fetch the vclock and use it as a version on the object itself throughout my application, much how Hibernate uses a version property. This way I am able to tell if an object is persistent or not based on whether it has a version or if it is undefined. So basically I created a version property on my record which, in turn, gets stored in Riak but this version will never match the actual vclock when looking at the data itself, since the version is only ever updated to the previous vclock and not the one after the object has been updated. Does that make more sense? The version I store in Riak is pretty worthless so it's not a big deal if it doesn't match since I update it to the current one when I load the object, but it would certainly make things easier on me if I could always have the version match the current vclock, so then I wouldn't have to update the version on fetches. --Andrew On Fri, Jul 22, 2011 at 6:06 AM, Sean Cribbs wrote: > Andrew, > Are you trying to store the vclock as part of the value? I'm > misunderstanding something. > Well-behaved clients should always fetch before writing, so your client > should have the most reasonably-fresh version of the object when you write. > There's really no way to guarantee that some other actor won't write a new > version between the time that you fetch the object and store it back, or > even in the time between your client issuing the write and it actually being > written to disk. Vector clocks and sibling values exist in part to help > disambiguate those race conditions. If you're submitting the write without > the vclock, your write could very well be ignored, so please fetch before > storing. > > On Thu, Jul 21, 2011 at 6:13 PM, Andrew Berman wrote: >> >> I'm looking to store the vclock on my object to be used for >> versioning. Currently, when I get the object from Riak I fill in the >> version with the vclock from Riak (which I use to determine if the >> object is persistent and for passing back to Riak when putting) and >> then when the object is saved it saves the version as the previous >> vclock value. I'm wondering when the vclock is actually generated. >> Can I write a pre-commit hook that fills in the version so it has the >> most updated value or is there no way for me to do it. It's not a >> huge deal because the version value in the db is immediately updated >> upon loading the data from Riak, but I just feel like it would make >> things more consistent if I could have the version matching the >> updated vclock. >> >> Thanks for any help! >> >> Andrew >> >> ___ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > > -- > Sean Cribbs > Developer Advocate > Basho Technologies, Inc. > http://www.basho.com/ > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Connection Pool with Erlang PB Client Necessary?
I know that this subject has been brought up before, but I'm still wondering what the value of a connection pool is with Riak. In my app, I'm using Webmachine resources to talk to a gen_server which in turn talks to Riak. So, in other words, the Webmachine resources never talk to Riak directly, they must always talk to the gen_server to deal with Riak. Since Erlang processes are so small and fast to create, is there really any overhead in having the gen_server create a new connection (with the same client id) each time it needs to access Riak? So the pseudo-code would look like this: my_webmachine_resource.erl some_service:persist(MyRecord). some_service.erl == persist(MyRecord) -> riak_repository:load(LoadSomething), riak_repository:persist(MyRecord), riak_repository:persist(SomethingElse). riak_repository.erl (this is the gen_server) persist(...) -> call (...) load(...) -> call(...) call() -> Pid = get_connection(ClientId), DoAction(Pid, ), close_connection(Pid) %% Is this even necessary? Thoughts? Another approach I thought of was: some_service.erl == persist(SomeRecord) -> riak_repository:execute(fun(Pid) -> riak_repository:persist(..., Pid), riak_repository:load(, Pid). end). riak_repository.erl == execute(Fun) -> try Pid = get_connection(), Fun(Pid) after close_connection(Pid) end Is one of these approaches better than the other in dealing with Riak and vclocks? Thanks, Andrew ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Connection Pool with Erlang PB Client Necessary?
Thanks for the reply Bryan. This all makes sense. I am fairly new to Erlang and wasn't sure if using a gen_server solved some of the issues with connections. From what I've seen a lot of people simply make calls to Riak directly from a resource and so I thought having a gen_server in front of Riak would help to manage things better. Apparently it doesn't. So, then, two more questions. I have used connection pools in Java like C3P0 and they can ramp up connections and then cull connections when there is a period of inactivity. The only pooler I've found that does this is: https://github.com/seth/pooler . Do you have any other recommendations on connection poolers? Second, I'm still a little confused on client ID. I thought client Id represented an actual client, not a connection. So, in my case, the gen_server is one client which makes multiple connections. After seeing what you wrote and reading a bit more on it, it seems like client Id should just be some random string (base64 encoded) that should be generated on creating a connection. Is that right? Thanks for your help! Andrew On Tue, Jul 26, 2011 at 9:39 AM, Bryan O'Sullivan wrote: > On Mon, Jul 25, 2011 at 4:03 PM, Andrew Berman wrote: >> >> I know that this subject has been brought up before, but I'm still >> wondering what the value of a connection pool is with Riak. > > It's a big deal: > > It amortises TCP and PBC connection setup overhead over a number of > requests, thereby reducing average query latency. > It greatly reduces the likelihood that very busy clients and servers will > run out of limited resources that are effectively invisible, e.g. closed TCP > connections stuck in TIME_WAIT. > > Each of the above is a pretty big deal. Of course, connection pooling isn't > free. > > If you have many clients talking to a server sporadically, you may end up > with large numbers of open-and-idle connections on a server, which will both > consume resources and increase latency for all other clients. This is > usually only a problem with a very large number (many thousands) of clients > per server, and it usually only arises with poorly written and tuned > connection pooling libraries. But ... > ... Most connection pooling libraries are poorly written and tuned, so > they'll behave pathologically just when you need them not to. > Since you don't set up a connection per request, the requests where you *do* > need to set up a connection are going to be more expensive than those where > you don't, so you'll see jitter in your latency profile. About 99.9% of > users will never, ever care about this. >> >> Since Erlang processes are so small and fast to >> create, is there really any overhead in having the gen_server create a >> new connection (with the same client id) each time it needs to access >> Riak? > > Of course. The overhead of Erlang processes has nothing to do with the cost > of setting up a connection. > Also, you really don't want to be using the same client ID repeatedly across > different connections. That's an awesome way to cause bugs with vclock > resolution that end up being very very hard to diagnose. ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Connection Pool with Erlang PB Client Necessary?
Thanks for all the replies guys! I just want to make sure I'm totally clear on this. Bob's solution would work well with my design. So basically, this would be the workflow? 1. check out connection from the pool 2. set client id on connection (which would have some static and some random component) 3. perform multiple operations (gets, puts, etc.) which would be seen as a single "transaction" 4. check in the connection to the pool This way once the connection is checked out from the pool, if another user comes along he cannot get that same connection until it has been checked back in, which would meet Justin's requirements. However, each time it's checked out, a new client id is created. Does this sound reasonable and in line with proper client id usage? Thanks again! Andrew On Tue, Jul 26, 2011 at 11:55 AM, Justin Sheehy wrote: > The simplest guidance on client IDs that I can give: > > If two mutation (PUT) operations could occur concurrently or without > awareness of each other, then they should have different client IDs. > > As a result of the above: if you are sharing a connection, then you > should use a different client ID for each separate user of that > connection. > > -Justin > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Connection Pool with Erlang PB Client Necessary?
Awesome! Thanks for all your help guys. On Tue, Jul 26, 2011 at 12:20 PM, Justin Sheehy wrote: > Yes, Andrew -- that is a fine approach to using a connection pool. > > Go for it. > > -Justin > > > > On Tue, Jul 26, 2011 at 3:18 PM, Andrew Berman wrote: >> Thanks for all the replies guys! >> >> I just want to make sure I'm totally clear on this. Bob's solution >> would work well with my design. So basically, this would be the >> workflow? >> >> 1. check out connection from the pool >> 2. set client id on connection (which would have some static and some >> random component) >> 3. perform multiple operations (gets, puts, etc.) which would be seen >> as a single "transaction" >> 4. check in the connection to the pool >> >> This way once the connection is checked out from the pool, if another >> user comes along he cannot get that same connection until it has been >> checked back in, which would meet Justin's requirements. However, >> each time it's checked out, a new client id is created. >> >> Does this sound reasonable and in line with proper client id usage? >> >> Thanks again! >> >> Andrew >> >> >> On Tue, Jul 26, 2011 at 11:55 AM, Justin Sheehy wrote: >>> The simplest guidance on client IDs that I can give: >>> >>> If two mutation (PUT) operations could occur concurrently or without >>> awareness of each other, then they should have different client IDs. >>> >>> As a result of the above: if you are sharing a connection, then you >>> should use a different client ID for each separate user of that >>> connection. >>> >>> -Justin >>> >> > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Connection Pool with Erlang PB Client Necessary?
Cool, I'll check it out, though there appears to be something wrong with your account as when I try to view the source, I get an error back from GitHub. On Thu, Jul 28, 2011 at 1:55 PM, Joel Meyer wrote: > > > On Tue, Jul 26, 2011 at 11:35 AM, Andrew Berman wrote: >> >> Thanks for the reply Bryan. This all makes sense. I am fairly new to >> Erlang and wasn't sure if using a gen_server solved some of the issues >> with connections. From what I've seen a lot of people simply make >> calls to Riak directly from a resource and so I thought having a >> gen_server in front of Riak would help to manage things better. >> Apparently it doesn't. >> >> So, then, two more questions. I have used connection pools in Java >> like C3P0 and they can ramp up connections and then cull connections >> when there is a period of inactivity. The only pooler I've found that >> does this is: https://github.com/seth/pooler . Do you have any other >> recommendations on connection poolers? > > I'm late to the party, but you could take a look at gen_server_pool > (https://github.com/openx/gen_server_pool). It's a pooling library I wrote > to provide pooling of gen_servers. I've used it mostly for Thrift clients, > but Anthony (also on the list) uses it to pool riak_pb clients in > webmachine. The basic idea is that you'd call > gen_server_pool:start_link(...) wherever you'd normally call > gen_server:start_link(...) and pass in a few extra args that control min and > max pool size, as well as idle timeout. You can use the Pid you get back > from that the same way you'd use the pid of your gen_server, except that all > work gets dispatched to a member of a pool instead of a single gen_server. > To be honest, I haven't tested out the open-source version I posted on > GitHub (sorry, I've been busy), but it's just a slightly modified version of > the internal library that's been used in production for several months with > good results. > Cheers, > Joel > >> >> Second, I'm still a little confused on client ID. I thought client Id >> represented an actual client, not a connection. So, in my case, the >> gen_server is one client which makes multiple connections. After >> seeing what you wrote and reading a bit more on it, it seems like >> client Id should just be some random string (base64 encoded) that >> should be generated on creating a connection. Is that right? >> >> Thanks for your help! >> >> Andrew >> >> On Tue, Jul 26, 2011 at 9:39 AM, Bryan O'Sullivan >> wrote: >> > On Mon, Jul 25, 2011 at 4:03 PM, Andrew Berman >> > wrote: >> >> >> >> I know that this subject has been brought up before, but I'm still >> >> wondering what the value of a connection pool is with Riak. >> > >> > It's a big deal: >> > >> > It amortises TCP and PBC connection setup overhead over a number of >> > requests, thereby reducing average query latency. >> > It greatly reduces the likelihood that very busy clients and servers >> > will >> > run out of limited resources that are effectively invisible, e.g. closed >> > TCP >> > connections stuck in TIME_WAIT. >> > >> > Each of the above is a pretty big deal. Of course, connection pooling >> > isn't >> > free. >> > >> > If you have many clients talking to a server sporadically, you may end >> > up >> > with large numbers of open-and-idle connections on a server, which will >> > both >> > consume resources and increase latency for all other clients. This is >> > usually only a problem with a very large number (many thousands) of >> > clients >> > per server, and it usually only arises with poorly written and tuned >> > connection pooling libraries. But ... >> > ... Most connection pooling libraries are poorly written and tuned, so >> > they'll behave pathologically just when you need them not to. >> > Since you don't set up a connection per request, the requests where you >> > *do* >> > need to set up a connection are going to be more expensive than those >> > where >> > you don't, so you'll see jitter in your latency profile. About 99.9% of >> > users will never, ever care about this. >> >> >> >> Since Erlang processes are so small and fast to >> >> create, is there really any overhead in having the gen_server create a >> >> new connection (with the same client id) each time it
Re: Connection Pool with Erlang PB Client Necessary?
So I looked at a bunch of pooling applications and none of them really have the functionality and flexibility I'm used to with Java connection pools. So, I created my own OTP pooling application, Pooly. It allows multiple pools to be configured, has flexibility on configuring the pool (idle timeout, max age of processes, initial count, acquire increment, max pool size, and min pool size) and reduces the size of the pool based on the configuration parameters. Feel free to check it out: https://github.com/aberman/pooly --Andrew On Tue, Jul 26, 2011 at 12:20 PM, Justin Sheehy wrote: > Yes, Andrew -- that is a fine approach to using a connection pool. > > Go for it. > > -Justin > > > > On Tue, Jul 26, 2011 at 3:18 PM, Andrew Berman wrote: >> Thanks for all the replies guys! >> >> I just want to make sure I'm totally clear on this. Bob's solution >> would work well with my design. So basically, this would be the >> workflow? >> >> 1. check out connection from the pool >> 2. set client id on connection (which would have some static and some >> random component) >> 3. perform multiple operations (gets, puts, etc.) which would be seen >> as a single "transaction" >> 4. check in the connection to the pool >> >> This way once the connection is checked out from the pool, if another >> user comes along he cannot get that same connection until it has been >> checked back in, which would meet Justin's requirements. However, >> each time it's checked out, a new client id is created. >> >> Does this sound reasonable and in line with proper client id usage? >> >> Thanks again! >> >> Andrew >> >> >> On Tue, Jul 26, 2011 at 11:55 AM, Justin Sheehy wrote: >>> The simplest guidance on client IDs that I can give: >>> >>> If two mutation (PUT) operations could occur concurrently or without >>> awareness of each other, then they should have different client IDs. >>> >>> As a result of the above: if you are sharing a connection, then you >>> should use a different client ID for each separate user of that >>> connection. >>> >>> -Justin >>> >> > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Using Secondary Index Result
Hello, I'm currently using the Riak Erlang client and when I do a get_index I only get the keys back. So, my question is, is it better to get the keys, loop through them and run a get on them one by one, or is it better to write my own MapRed job which queries the index and then runs a map phase using the function map_object_value. I remember reading somewhere that you're better off running gets on multiple keys vs using a MapRed job, but is this still the case for this use case and with Riak 1.0? Thanks, Andrew ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Lager AMQP backend
Awesome stuff Jon! On Thu, Nov 17, 2011 at 1:49 PM, Andrew Thompson wrote: > On Thu, Nov 17, 2011 at 02:05:58PM -0600, Jon Brisbin wrote: > > I pushed to my Github a Lager backend for sending log messages to > RabbitMQ via AMQP: > > > > https://github.com/jbrisbin/lager_amqp_backend > > > > It uses a re-connecting connection pool for sending messages, so it's > pretty fast and will automatically recover if RabbitMQ goes down (but it > does *not*, at the moment, internally queue log messages if it can't > connect to the broker). > > > > The idea is to aggregate logging from riak_core applications, but you > should be able to use it in Riak/DB as well. > > > Many thanks, it looks good. I've added a link to it from lager's README. > > Andrew > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Integrating Riak
Hello, I'm a little confused on how I would go about integrating Riak into an Erlang application. Here's my use case. I'm creating an HTTP proxy using Misultin which intercepts any requests to my backend REST services and provides all the session handling. So, I would like to use Riak to store the sessions for the front-end. Since my proxy is written in Erlang, I figured it makes more sense to have it run on the same node as Riak and use the local Riak Erlang client to speed things up. So, questions: 1. Would I just depend on riak_kv for my app? 2. How do I go about configuring the riak application from within my application? I can't find any documentation on this. 3. How do I get riak on a node to join the other nodes? 4. Does it make more sense to just install a riak package and use the erlang pb client? Seems like it would be less efficient especially since these will live on the same machine. Thanks for any help! Andrew ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Integrating Riak
Ok, cool, thanks Dave, makes sense. Keep up the great work! On Tue, Nov 22, 2011 at 9:31 PM, David Smith wrote: > On Tue, Nov 22, 2011 at 4:36 PM, Andrew Berman wrote: > > 4. Does it make more sense to just install a riak package and use the > > erlang pb client? Seems like it would be less efficient especially since > > these will live on the same machine. > > This is the preferred way to attack this problem. Separation of the > functionality by O/S processes is appropriate and much easier to > reason about in error situations. Loopback sockets to the PBs > interface should be within a 1 ms of total request handling time. > > D. > > -- > Dave Smith > Director, Engineering > Basho Technologies, Inc. > diz...@basho.com > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Best practices for using the PB client
You should look into using HAProxy in front of your nodes. Let HAProxy load balance between all your nodes and then if one goes down, HAProxy just pulls it out of the load balancing cluster automatically until it is restored. Then your pooler can just pool connections from HAProxy instead so it doesn't have to worry at all about failed nodes. Also, shameless plug, I have a pooler as well which has a few more options than pooler. You can check it out here: https://github.com/aberman/pooly --Andrew On Fri, Dec 30, 2011 at 9:58 AM, Marc Campbell wrote: > Hey all, > > I'm looking for some best practices in handling connections when using the > protocol buffer client. Specifically, I have 3 nodes in my cluster, and > need to figure out how to handle the situation when one of the nodes is > down. > > I'm currently using a pooler app (https://github.com/seth/pooler) and > this helps me distribute the load to all of the nodes, but when one goes > down, the app doesn't recover nicely. > > I'm about to write some code in my app to handle this, but before I do, I > thought I'd check for existing solutions and best practices: > > - Is there an existing connection pooling mechanism that someone has > created which handles node failures automatically? > > If not, then I'm looking forward to writing it! > > Thank in advance, > Marc > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com