Re: [DISCUSS] ORC separate project

Owen O'Malley Wed, 08 Apr 2015 08:33:30 -0700

On Tue, Apr 7, 2015 at 8:49 PM, Xuefu Zhang <xzh...@cloudera.com> wrote:


> If I understood Allen's #2 comment, we are moving existing ORC code out of
> Hive and make it a separate project, which I definitely missed.
>

I'm sorry that wasn't clear. Yes, most of the code that is currently in
org.apache.hadoop.hive.ql.io.orc will move to the new project.

The biggest change on the Hive side will be to create a new Hive module
that defines the API that storage formats like ORC need to code against if
they want high performance integration with Hive's vectorization. I've
started that jira at https://issues.apache.org/jira/browse/HIVE-10171 .
Creating this API should help us create a clean interface for storage
formats that will help ORC and other columnar formats like Trevni or
Parquet.

Once the ORC project has made its first release, we can create a Hive jira
to replace the Hive ORC code with a reference to the ORC release jar.


> Since existing Hive PMC has governance on the code, I would expect it's
> still the case even after the spinoff.
>

No, Apache doesn't allow umbrella projects where one PMC controls
sub-projects. The reason is that the Apache board has found that
controlling projects directly instead of indirectly through another PMC
reduces the problems.

.. Owen

Re: [DISCUSS] ORC separate project

Reply via email to