Hello,
I created a feature request for compression of values about a year ago
(https://issues.basho.com/show_bug.cgi?id=412), but unfortunately it
seems there was no interest in that.
For the time being we have to patch Riak to enable compression, though
we do it in the riak_kv_vnode module by changing the calls to
term_to_binary(Obj) to term_to_binary(Obj, [compressed]).
In our case we see compression ratios of somewhere around 1.7-1.8, and
the data is already quite compact, as we mostly store thrift serialized
objects which contain no amount of textual content worth mentioning.
Cheers,
Nico
Am Freitag, den 24.06.2011, 15:24 -0700 schrieb Andrew Berman:
> And related, does Bitcask have any sort of compression built into it?
>
> On Fri, Jun 24, 2011 at 2:58 PM, Andrew Berman wrote:
> > Mathias,
> >
> > I took the BERT encoding and then encoded that as Base64 which should
> > pass the test of valid UTF-8 characters. However, now I'm starting to
> > think that maybe doing two encodings and storing that for the purpose
> > of saving space is not worth the trade-off in performance vs just
> > storing the data in JSON format. Do you guys have any thoughts on
> > this?
> >
> > Thanks,
> >
> > Andrew
> >
> > On Thu, Jun 23, 2011 at 8:39 AM, Mathias Meyer wrote:
> >> Andrew,
> >>
> >> the data looks like JSON, but it's not valid JSON. Have a look at the list
> >> that's in the data section (which is your BERT encoded data), the first
> >> character in that list is 131, which is not a valid UTF-8 character, and
> >> JSON only allows valid UTF-8 characters. With a binary-encoded format,
> >> there's always a chance for a control character like that to blow up the
> >> JSON generated before and after the MapReduce code is executed. With JSON,
> >> content agnosticism only goes as far as the set of legal characters allows.
> >>
> >> On a side note, if the data were a valid representation of a string, you
> >> would see it as a string in the log file as well, not just as a list of
> >> numbers.
> >>
> >> Mathias Meyer
> >> Developer Advocate, Basho Technologies
> >>
> >>
> >> On Donnerstag, 23. Juni 2011 at 17:31, Andrew Berman wrote:
> >>
> >>> But isn't the value itself JSON? Meaning this part:
> >>>
> >>> {struct,
> >>> [{<<"bucket">>,<<"user">>},
> >>> {<<"key">>,<<"LikiWUPJSFuxtrhCYpsPfg">>},
> >>> {<<"vclock">>,
> >>>
> >>> <<"a85hYGBgzGDKBVIsLKaZdzOYEhnzWBmes6Yd58sCAA==">>},
> >>> {<<"values">>,
> >>> [{struct,
> >>> [{<<"metadata">>,
> >>> {struct,
> >>> [{<<"X-Riak-VTag">>,
> >>> <<"1KnL9Dlma9Yg4eMhRuhwtx">>},
> >>> {<<"X-Riak-Last-Modified">>,
> >>> <<"Fri, 10 Jun 2011 03:05:11 GMT">>}]}},
> >>> {<<"data">>,
> >>>
> >>> <<131,108,0,0,0,18,104,2,100,0,6,114,...>>}]}]}
> >>>
> >>> So the only thing that is not JSON is the data itself, but when I get
> >>> the value, shouldn't I be getting the all the info above which is JSON
> >>> encoded?
> >>>
> >>> Thank you all for your help,
> >>>
> >>> Andrew
> >>>
> >>> On Thu, Jun 23, 2011 at 8:17 AM, Sean Cribbs >>> (mailto:s...@basho.com)> wrote:
> >>> > The object has to be JSON-encoded to be marshalled into the Javascript
> >>> > VM,
> >>> > and also on the way out if the Accept header indicates
> >>> > application/json. So
> >>> > you have two places where it needs to be encodable into JSON.
> >>> > On Thu, Jun 23, 2011 at 11:14 AM, Andrew Berman >>> > (mailto:rexx...@gmail.com)> wrote:
> >>> > >
> >>> > > Mathias,
> >>> > >
> >>> > > I thought Riak was content agnostic when it came to the data being
> >>> > > stored? The map phase is not running Riak.mapValuesJson, so why is
> >>> > > the data itself going through the JSON parser? The JSON value
> >>> > > returned by v with all the info is valid and I see the struct atom in
> >>> > > there so mochijson2 can parse it properly, but I'm not clear why
> >>> > > mochijson2 would be coughing at the data part.
> >>> > >
> >>> > > --Andrew
> >>> > >
> >>> > > On Thu, Jun 23, 2011 at 5:32 AM, Mathias Meyer >>> > > (mailto:math...@basho.com)> wrote:
> >>> > > > Andrew,
> >>> > > >
> >>> > > > you're indeed hitting a JSON encoding problem here. BERT is binary
> >>> > > > data,
> >>> > > > and won't make the JSON parser happy when trying to deserialize it,
> >>> > > > before
> >>> > > > handing it into the map phase. You have two options here, and none
> >>> > > > of them
> >>> > > > will involve JavaScript as the MapReduce language.
> >>> > > >
> >>> > > > 1.) Use the Protobuff API, use Erlang functions to return the value
> >>> > > > or
> >>> > > > object (e.g. riak_mapreduce:map_object_value or
> >>> > > > riak_kv_mapreduce:map_identity), and then run MapReduce queries
> >>> > > > with the
> >>> > > > content type 'application/x-erlang-binary'. However, you're
> >>> > > > constrained by
> >>> > > > client libraries here, e.g. Ruby and Python don't support this
> >>> > > > content type
> >>> > > > for MapReduce on the Protobuffs interface yet, so you'd either
> >>> > > > implement
> >