Hi, Is LazyBinaryColumnarSerDe more space efficient than ColumnarSerDe in general?
Let me make my question more specific. I generated two tables from the table lineitem of TPC-H using ColumnarSerDe and LazyBinaryColumnarSerDe as follows... CREATE TABLE lineitem_rcfile_lazybinary ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe" STORED AS RCFile AS SELECT * from lineitem; CREATE TABLE lineitem_rcfile_lazy ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe" STORED AS RCFile AS SELECT * from lineitem; Since serialization of LazyBinaryColumnarSerDe is binary-based and that of ColumnarSerDe is text-based, I expect to see table lineitem_rcfile_lazybinary is smaller than lineitem_rcfile_lazy. However, no matter whether compression is enabled, lineitem_rcfile_lazybinary is little bit larger than lineitem_rcfile_lazy. Did I use LazyBinaryColumnarSerDe in a wrong way? btw, the row group size of RCFile is 32MB. Thanks, Yin