Gotcha, thanks for the background.
I think as long as you can preserve the same level of compatibility with
the other lexicoders, this would be a nice addition. If it's an itch you
want to scratch, others probably will want to do the same too :)
Keith probably knows the most about what current works off the top of
his head (since he wrote the Lexicoders, IIRC), but I imagine he's
taking some time off work and isn't watch the list mailing list closely.
If you get stuck with how to implement this, let me know and I can try
to poke around at the implementation too.
Adam J. Shook wrote:
Hi Josh,
Thanks for the advice. I'm with you on using the CQ and Value instead
of putting the whole map into a Value, but what I am working on is using
the relational model of mapping data to Accumulo and expects the value
of the cell to be in the Value. Certainly some optimization
opportunities by using the 'better' ways for storing data in Accumulo,
but I'd like to get this working before diving into that rabbit hole.
A brief look at the ListLexicoder encodes each element of the list using
a sub-lexicoder and escapes each element (0x00 -> 0x01 0x01 and 0x01 ->
0x01 0x02). The voodoo here escapes me a little (pun!), but it seems to
be enough to enable multi-dimensional arrays encoded by nesting
ListLexicoders (up to 4D, haven't tried a fifth dimension). I would
expect something similar could be done using a Map. Would a
MapLexicoder be something worth contributing to the project? I'd be
happy to give it a stab.
--Adam
On Mon, Dec 28, 2015 at 12:21 PM, Josh Elser <[email protected]
<mailto:[email protected]>> wrote:
Looks like you would have to implement some kind of ComparableMap to
be able to use the PairLexicoder (see that the parameterization
requires both types in the Pair to implement Comparable). The Pair
lexicoder requires these Comparable types to align itself with the
original goal of the Lexicoders: provide byte-array serialization
for types whose sort order matches the original object's ordering.
Typically, when we have key to value style data we want to put in
Accumulo, it makes sense to leverage the Column Qualifier and the
Value, instead of serializing everything into one Accumulo Value.
Iterators make it easy to do server-side predicates and
transformations. My hunch is that this is another reason why you
don't already see a MapLexicoder provided.
One technical difficulty you might run into implementing a
generalized MapLexicoder is how you delimit the key and value in one
pair and how you delimit many pairs from each other. Commonly, the
"null" byte (\x00) is used as a separator since it doesn't often
appear in user-data. I'm not sure if some of the other Lexicoders
already use this in their serialization (e.g. the ListLexicoder
might, I haven't looked at the code). Nesting Lexicoders generically
might be tricky (although not impossible) -- thought it was worth
mentioning to make sure you thought about it.
Adam J. Shook wrote:
Hello all,
Any suggestions for using a Map Lexicoder (or implementing
one)? I am
currently using a new ListLexicoder(new PairLexicoder(some
lexicoder,
some lexicoder), which is working for single maps. However,
when one of
the lexicoders in the Pair is itself a Map (and therefore another
ListLexicoder(PairLexicoder)), an exception is being thrown because
ArrayList is not Comparable.
Regards,
--Adam