I guess LazyBinaryColumnarSerDe is not saving spaces, but is cpu efficient. You tests aligns with our internal tests long time ago.
On Tue, Mar 6, 2012 at 8:58 AM, Yin Huai <huaiyin....@gmail.com> wrote: > Hi, > > Is LazyBinaryColumnarSerDe more space efficient than ColumnarSerDe in > general? > > Let me make my question more specific. > > I generated two tables from the table lineitem of TPC-H > using ColumnarSerDe and LazyBinaryColumnarSerDe as follows... > CREATE TABLE lineitem_rcfile_lazybinary > ROW FORMAT SERDE > "org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe" > STORED AS RCFile AS > SELECT * from lineitem; > > CREATE TABLE lineitem_rcfile_lazy > ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe" > STORED AS RCFile AS > SELECT * from lineitem; > > Since serialization of LazyBinaryColumnarSerDe is binary-based and that > of ColumnarSerDe is text-based, I expect to see > table lineitem_rcfile_lazybinary is smaller than lineitem_rcfile_lazy. > However, no matter whether compression is > enabled, lineitem_rcfile_lazybinary is little bit larger > than lineitem_rcfile_lazy. Did I use LazyBinaryColumnarSerDe in a wrong way? > > btw, the row group size of RCFile is 32MB. > > Thanks, > > Yin