All,

As part of moving ORC out of Hive, we pulled all of the vectorization
storage and sarg classes into a separate module, which is named
storage-api.  Although it is currently only used by ORC, it could be used
by Parquet or Avro if they wanted to make a fast vectorized reader that
read directly in to Hive's VectorizedRowBatch without needing a shim or
data copy. Note that this is in many ways similar to pulling the Arrow
project out of Drill.

This unfortunately still leaves us with a circular dependency between Hive
and ORC. I'd hoped that storage-api wouldn't change that much, but that
doesn't seem to be happening. As a result, ORC ends up shipping its own
fork of storage-api.

Although we could make a new project for just the storage-api, I think it
would be better to make it a subproject of Hive that is released
independently.

What do others think?

   Owen

Reply via email to