On Fri, Apr 3, 2015 at 1:25 PM, Lefty Leverenz <leftylever...@gmail.com> wrote:
> Hive users who wished to use ORC would obviously need to pull in ORC >> artifacts in addition to Hive. >> > > What would happen with Hive features that (currently) only work with ORC? > Would they be extended to work with other file formats and stay in Hive? > What about future features -- would they have to work with multiple file > formats from the get-go? > The storage-api module proposed above would lead to clearer storage interfaces in hive. That will in turn help to implement such features using other storage including parquet, hbase etc. The result of this work will not automatically make those features worth with ORC, somebody would need to do that. Whether future features would work for all formats would depend on whether the new feature needs new functionality to be supported by the storage layer. If the feature needs new storage functionality, I would expect new interfaces to be defined in hive, and then implemented by the storage engines that want to support that feature. This will not negatively impact experience of users with respect to ORC or other storage formats. The way we package parquet in hive, we can package ORC as well. In fact, users would be more easily be able to upgrade their version of ORC being used, as releases can happen independent of each other.