Hello, I created a feature request for compression of values about a year ago (https://issues.basho.com/show_bug.cgi?id=412), but unfortunately it seems there was no interest in that. For the time being we have to patch Riak to enable compression, though we do it in the riak_kv_vnode module by changing the calls to term_to_binary(Obj) to term_to_binary(Obj, [compressed]).
In our case we see compression ratios of somewhere around 1.7-1.8, and the data is already quite compact, as we mostly store thrift serialized objects which contain no amount of textual content worth mentioning. Cheers, Nico Am Freitag, den 24.06.2011, 15:24 -0700 schrieb Andrew Berman: > And related, does Bitcask have any sort of compression built into it? > > On Fri, Jun 24, 2011 at 2:58 PM, Andrew Berman <rexx...@gmail.com> wrote: > > Mathias, > > > > I took the BERT encoding and then encoded that as Base64 which should > > pass the test of valid UTF-8 characters. However, now I'm starting to > > think that maybe doing two encodings and storing that for the purpose > > of saving space is not worth the trade-off in performance vs just > > storing the data in JSON format. Do you guys have any thoughts on > > this? > > > > Thanks, > > > > Andrew > > > > On Thu, Jun 23, 2011 at 8:39 AM, Mathias Meyer <math...@basho.com> wrote: > >> Andrew, > >> > >> the data looks like JSON, but it's not valid JSON. Have a look at the list > >> that's in the data section (which is your BERT encoded data), the first > >> character in that list is 131, which is not a valid UTF-8 character, and > >> JSON only allows valid UTF-8 characters. With a binary-encoded format, > >> there's always a chance for a control character like that to blow up the > >> JSON generated before and after the MapReduce code is executed. With JSON, > >> content agnosticism only goes as far as the set of legal characters allows. > >> > >> On a side note, if the data were a valid representation of a string, you > >> would see it as a string in the log file as well, not just as a list of > >> numbers. > >> > >> Mathias Meyer > >> Developer Advocate, Basho Technologies > >> > >> > >> On Donnerstag, 23. Juni 2011 at 17:31, Andrew Berman wrote: > >> > >>> But isn't the value itself JSON? Meaning this part: > >>> > >>> {struct, > >>> [{<<"bucket">>,<<"user">>}, > >>> {<<"key">>,<<"LikiWUPJSFuxtrhCYpsPfg">>}, > >>> {<<"vclock">>, > >>> > >>> <<"a85hYGBgzGDKBVIsLKaZdzOYEhnzWBmes6Yd58sCAA==">>}, > >>> {<<"values">>, > >>> [{struct, > >>> [{<<"metadata">>, > >>> {struct, > >>> [{<<"X-Riak-VTag">>, > >>> <<"1KnL9Dlma9Yg4eMhRuhwtx">>}, > >>> {<<"X-Riak-Last-Modified">>, > >>> <<"Fri, 10 Jun 2011 03:05:11 GMT">>}]}}, > >>> {<<"data">>, > >>> > >>> <<131,108,0,0,0,18,104,2,100,0,6,114,...>>}]}]} > >>> > >>> So the only thing that is not JSON is the data itself, but when I get > >>> the value, shouldn't I be getting the all the info above which is JSON > >>> encoded? > >>> > >>> Thank you all for your help, > >>> > >>> Andrew > >>> > >>> On Thu, Jun 23, 2011 at 8:17 AM, Sean Cribbs <s...@basho.com > >>> (mailto:s...@basho.com)> wrote: > >>> > The object has to be JSON-encoded to be marshalled into the Javascript > >>> > VM, > >>> > and also on the way out if the Accept header indicates > >>> > application/json. So > >>> > you have two places where it needs to be encodable into JSON. > >>> > On Thu, Jun 23, 2011 at 11:14 AM, Andrew Berman <rexx...@gmail.com > >>> > (mailto:rexx...@gmail.com)> wrote: > >>> > > > >>> > > Mathias, > >>> > > > >>> > > I thought Riak was content agnostic when it came to the data being > >>> > > stored? The map phase is not running Riak.mapValuesJson, so why is > >>> > > the data itself going through the JSON parser? The JSON value > >>> > > returned by v with all the info is valid and I see the struct atom in > >>> > > there so mochijson2 can parse it properly, but I'm not clear why > >>> > > mochijson2 would be coughing at the data part. > >>> > > > >>> > > --Andrew > >>> > > > >>> > > On Thu, Jun 23, 2011 at 5:32 AM, Mathias Meyer <math...@basho.com > >>> > > (mailto:math...@basho.com)> wrote: > >>> > > > Andrew, > >>> > > > > >>> > > > you're indeed hitting a JSON encoding problem here. BERT is binary > >>> > > > data, > >>> > > > and won't make the JSON parser happy when trying to deserialize it, > >>> > > > before > >>> > > > handing it into the map phase. You have two options here, and none > >>> > > > of them > >>> > > > will involve JavaScript as the MapReduce language. > >>> > > > > >>> > > > 1.) Use the Protobuff API, use Erlang functions to return the value > >>> > > > or > >>> > > > object (e.g. riak_mapreduce:map_object_value or > >>> > > > riak_kv_mapreduce:map_identity), and then run MapReduce queries > >>> > > > with the > >>> > > > content type 'application/x-erlang-binary'. However, you're > >>> > > > constrained by > >>> > > > client libraries here, e.g. Ruby and Python don't support this > >>> > > > content type > >>> > > > for MapReduce on the Protobuffs interface yet, so you'd either > >>> > > > implement > >>> > > > something custom, or resort to a client that does, > >>> > > > riak-erlang-client comes > >>> > > > to mind, though it was proven to be possible using the Java client > >>> > > > too, > >>> > > > thanks to Russell. See [1] and [2] > >>> > > > > >>> > > > 2.) Convert the result from BERT into a JSON-parseable structure > >>> > > > inside > >>> > > > an Erlang map function, before it's returned to the client. > >>> > > > > >>> > > > The second approach certainly is less restrictive in terms of API > >>> > > > usage, > >>> > > > but certainly involves some overhead inside of the MapReduce > >>> > > > request itself, > >>> > > > but is less prone to encoding/decoding issues with JSON. > >>> > > > > >>> > > > Mathias Meyer > >>> > > > Developer Advocate, Basho Technologies > >>> > > > > >>> > > > [1] > >>> > > > http://lists.basho.com/pipermail/riak-users_lists.basho.com/2011-June/004447.html > >>> > > > [2] > >>> > > > http://lists.basho.com/pipermail/riak-users_lists.basho.com/2011-June/004485.html > >>> > > > > >>> > > > On Donnerstag, 23. Juni 2011 at 07:59, Andrew Berman wrote: > >>> > > > > >>> > > > > Hey Ryan, > >>> > > > > > >>> > > > > Here is the error from the sasl log. It looks like some sort of > >>> > > > > encoding error. Any thoughts on how to fix this? I am storing the > >>> > > > > data as BERT encoded binary and I set the content-type as > >>> > > > > application/octet-stream. > >>> > > > > > >>> > > > > Thanks for your help! > >>> > > > > > >>> > > > > Andrew > >>> > > > > > >>> > > > > ERROR REPORT==== 9-Jun-2011::21:37:05 === > >>> > > > > ** Generic server <0.5996.21> terminating > >>> > > > > ** Last message in was {batch_dispatch, > >>> > > > > {map, > >>> > > > > {jsanon,<<"function(value) {return [value];}">>}, > >>> > > > > [{struct, > >>> > > > > [{<<"bucket">>,<<"user">>}, > >>> > > > > {<<"key">>,<<"LikiWUPJSFuxtrhCYpsPfg">>}, > >>> > > > > {<<"vclock">>, > >>> > > > > > >>> > > > > <<"a85hYGBgzGDKBVIsLKaZdzOYEhnzWBmes6Yd58sCAA==">>}, > >>> > > > > {<<"values">>, > >>> > > > > [{struct, > >>> > > > > [{<<"metadata">>, > >>> > > > > {struct, > >>> > > > > [{<<"X-Riak-VTag">>, > >>> > > > > <<"1KnL9Dlma9Yg4eMhRuhwtx">>}, > >>> > > > > {<<"X-Riak-Last-Modified">>, > >>> > > > > <<"Fri, 10 Jun 2011 03:05:11 GMT">>}]}}, > >>> > > > > {<<"data">>, > >>> > > > > > >>> > > > > <<131,108,0,0,0,18,104,2,100,0,6,114,...>>}]}]}]}, > >>> > > > > <<"user">>,none]}} > >>> > > > > ** When Server state == > >>> > > > > {state,<0.143.0>,riak_kv_js_map,#Port<0.92614>,true} > >>> > > > > ** Reason for termination == > >>> > > > > ** {function_clause,[{js_driver,eval_js, > >>> > > > > [#Port<0.92614>,{error,bad_encoding},5000]}, > >>> > > > > {riak_kv_js_vm,invoke_js,2}, > >>> > > > > {riak_kv_js_vm,define_invoke_anon_js,3}, > >>> > > > > {riak_kv_js_vm,handle_call,3}, > >>> > > > > {gen_server,handle_msg,5}, > >>> > > > > {proc_lib,init_p_do_apply,3}]} > >>> > > > > > >>> > > > > =CRASH REPORT==== 9-Jun-2011::21:37:05 === > >>> > > > > crasher: > >>> > > > > initial call: riak_kv_js_vm:init/1 > >>> > > > > pid: <0.5996.21> > >>> > > > > registered_name: [] > >>> > > > > exception exit: > >>> > > > > > >>> > > > > {function_clause,[{js_driver,eval_js,[#Port<0.92614>,{error,bad_encoding},5000]},{riak_kv_js_vm,invoke_js,2},{riak_kv_js_vm,define_invoke_anon_js,3},{riak_kv_js_vm,handle_call,3},{gen_server,handle_msg,5},{proc_lib,init_p_do_apply,3}]} > >>> > > > > in function gen_server:terminate/6 > >>> > > > > in call from proc_lib:init_p_do_apply/3 > >>> > > > > ancestors: [riak_kv_js_sup,riak_kv_sup,<0.128.0>] > >>> > > > > messages: [] > >>> > > > > links: [<0.142.0>,<0.6009.21>] > >>> > > > > dictionary: [] > >>> > > > > trap_exit: false > >>> > > > > status: running > >>> > > > > heap_size: 4181 > >>> > > > > stack_size: 24 > >>> > > > > reductions: 2586 > >>> > > > > neighbours: > >>> > > > > neighbour: > >>> > > > > [{pid,<0.6009.21>},{registered_name,[]},{initial_call,{riak_kv_mapper,init,[Argument__1]}},{current_function,{gen,do_call,4}},{ancestors,[riak_kv_mapper_sup,riak_kv_sup,<0.128.0>]},{messages,[]},{links,[<0.5996.21>,<12337.6227.21>,<0.162.0>]},{dictionary,[]},{trap_exit,false},{status,waiting},{heap_size,987},{stack_size,53},{reductions,1043}] > >>> > > > > =SUPERVISOR REPORT==== 9-Jun-2011::21:37:05 === > >>> > > > > Supervisor: {local,riak_kv_js_sup} > >>> > > > > Context: child_terminated > >>> > > > > Reason: > >>> > > > > > >>> > > > > {function_clause,[{js_driver,eval_js,[#Port<0.92614>,{error,bad_encoding},5000]},{riak_kv_js_vm,invoke_js,2},{riak_kv_js_vm,define_invoke_anon_js,3},{riak_kv_js_vm,handle_call,3},{gen_server,handle_msg,5},{proc_lib,init_p_do_apply,3}]} > >>> > > > > Offender: > >>> > > > > > >>> > > > > [{pid,<0.5996.21>},{name,undefined},{mfargs,{riak_kv_js_vm,start_link,undefined}},{restart_type,temporary},{shutdown,2000},{child_type,worker}] > >>> > > > > > >>> > > > > On Wed, Jun 22, 2011 at 6:10 PM, Ryan Zezeski <rzeze...@basho.com > >>> > > > > (mailto:rzeze...@basho.com) > >>> > > > > (mailto:rzeze...@basho.com)> wrote: > >>> > > > > > > >>> > > > > > Andrew, > >>> > > > > > Maybe you could elaborate on the error? I tested this against > >>> > > > > > master > >>> > > > > > (commit below) just now with success. > >>> > > > > > 2b1a474f836d962fa035f48c05452e22fc6c2193 Change dependency to > >>> > > > > > allow > >>> > > > > > for R14B03 as well as R14B02 > >>> > > > > > -Ryan > >>> > > > > > On Wed, Jun 22, 2011 at 7:03 PM, Andrew Berman > >>> > > > > > <rexx...@gmail.com (mailto:rexx...@gmail.com) > >>> > > > > > (mailto:rexx...@gmail.com)> wrote: > >>> > > > > > > > >>> > > > > > > Hello, > >>> > > > > > > I'm having issues link walking using the Map Reduce link > >>> > > > > > > function. > >>> > > > > > > I am using HEAD from Git, so it's possible that's the issue, > >>> > > > > > > but here is > >>> > > > > > > what is happening. > >>> > > > > > > I've got two buckets, user and user_email where user_email > >>> > > > > > > contains > >>> > > > > > > a link to the user. > >>> > > > > > > When I run this: > >>> > > > > > > { > >>> > > > > > > "inputs": [ > >>> > > > > > > [ > >>> > > > > > > "user_email", > >>> > > > > > > "myem...@email.com (mailto:myem...@email.com)" > >>> > > > > > > ] > >>> > > > > > > ], > >>> > > > > > > "query": [ > >>> > > > > > > { > >>> > > > > > > "link": { > >>> > > > > > > "bucket": "user", > >>> > > > > > > "tag": "user" > >>> > > > > > > } > >>> > > > > > > } > >>> > > > > > > ] > >>> > > > > > > } > >>> > > > > > > I only get [["user","LikiWUPJSFuxtrhCYpsPfg","user"]] > >>> > > > > > > returned. The > >>> > > > > > > second I add a map function, even the simplest one > >>> > > > > > > (function(v) { [v] } I > >>> > > > > > > get a "map_reduce error": > >>> > > > > > > { > >>> > > > > > > "inputs": [ > >>> > > > > > > [ > >>> > > > > > > "user_email", > >>> > > > > > > "myem...@email.com (mailto:myem...@email.com)" > >>> > > > > > > ] > >>> > > > > > > ], > >>> > > > > > > "query": [ > >>> > > > > > > { > >>> > > > > > > "link": {"bucket":"user", "tag":"user"} > >>> > > > > > > } > >>> > > > > > > ,{ > >>> > > > > > > "map": { > >>> > > > > > > "language": "javascript", > >>> > > > > > > "source": "function(v) { return[v]; }" > >>> > > > > > > } > >>> > > > > > > } > >>> > > > > > > ] > >>> > > > > > > } > >>> > > > > > > Is this functionality broken? I am following what it says on > >>> > > > > > > the > >>> > > > > > > Wiki for the MapRed version of link walking. When I use HTTP > >>> > > > > > > link walking, > >>> > > > > > > it works correctly. > >>> > > > > > > Thanks, > >>> > > > > > > Andrew > >>> > > > > > > _______________________________________________ > >>> > > > > > > riak-users mailing list > >>> > > > > > > riak-users@lists.basho.com (mailto:riak-users@lists.basho.com) > >>> > > > > > > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > >>> > > > > > >>> > > > > _______________________________________________ > >>> > > > > riak-users mailing list > >>> > > > > riak-users@lists.basho.com (mailto:riak-users@lists.basho.com) > >>> > > > > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > >>> > > > >>> > > _______________________________________________ > >>> > > riak-users mailing list > >>> > > riak-users@lists.basho.com (mailto:riak-users@lists.basho.com) > >>> > > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > >>> > > >>> > > >>> > > >>> > -- > >>> > Sean Cribbs <s...@basho.com (mailto:s...@basho.com)> > >>> > Developer Advocate > >>> > Basho Technologies, Inc. > >>> > http://www.basho.com/ > >> > >> > >> > > > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com