On Tue, Apr 7, 2015 at 8:49 PM, Xuefu Zhang <xzh...@cloudera.com> wrote:
> If I understood Allen's #2 comment, we are moving existing ORC code out of > Hive and make it a separate project, which I definitely missed. > I'm sorry that wasn't clear. Yes, most of the code that is currently in org.apache.hadoop.hive.ql.io.orc will move to the new project. The biggest change on the Hive side will be to create a new Hive module that defines the API that storage formats like ORC need to code against if they want high performance integration with Hive's vectorization. I've started that jira at https://issues.apache.org/jira/browse/HIVE-10171 . Creating this API should help us create a clean interface for storage formats that will help ORC and other columnar formats like Trevni or Parquet. Once the ORC project has made its first release, we can create a Hive jira to replace the Hive ORC code with a reference to the ORC release jar. > Since existing Hive PMC has governance on the code, I would expect it's > still the case even after the spinoff. > No, Apache doesn't allow umbrella projects where one PMC controls sub-projects. The reason is that the Apache board has found that controlling projects directly instead of indirectly through another PMC reduces the problems. .. Owen