Knowing that sequencefiles can store data (especially numeric data) much more compact that text, i started converting our hive database from lzo compressed text format to lzo compressed sequencdfiles.
My first observation was that the files were not smaller, which surprised me since we have mostly numerical data which has a more compact binary representation. So then i issued some "describe extended" queries to poke around in the sequencefile format used by hive. And it seems that 1) the keys are not used, and 2) all the values are simply stored as a Text Writable? Is this simply a copy of the textual representation which was used in the text files? That would explain why the data did not get any smaller. But it also would defeat all the benefits of sequencefiles, no? Thanks Koert