Below is a link for a simple client side compression scheme. I thought this 
might be of interest for some members of the list.

While column values and column names are easy to handle on the client side, 
with the use of a custom column name comparator for the column names, the fact 
that there is only one row partitioner for all column families makes it 
complicated to use compression for the row keys if you have multiple data types 
for the keys of the different column families. Using properties of Unicode, the 
below scheme can differentiate between uncompresses Unicode strings, compressed 
Unicode strings, uncompressed UUIDs, and a pass through code for no compression 
for a one byte penalty. For my project I only use Unicode strings and UUIDs for 
my row keys, so this works well for me. The actual compression algorithm can 
work with both short strings using a static probability table for arithmetic 
coding compression and long strings using an adaptive arithmetic coding 
compression You milage may vary. I will have code for this design in a month or 
two.

http://www.semanticartifacts.com/compression/compression.html

-------------
Sincerely,
David G. Boney
dbon...@semanticartifacts.com
http://www.semanticartifacts.com




Reply via email to