To Lefty's comment - Yes, anyone can take Apache code and make another project at will. However, for changes made to an existing project as part of that process, such as what Owen described for ORC in Hive, it is certainly something that Hive PMC can control or vote on. Nevertheless, that's not my immediate concern.
To Owen's explanation - Thanks. I guess my major concern is that we seemingly are breaking apart Hive's integrity and making it hard to release and maintain due to increasing number of external dependents. Let's say that Hive depends on a certain version of ORC (as TLP) and it's found that ORC has a bug that seriously impacts Hive users. We cannot release Hive as fast as we can, since dong so would need ORC community to fix the problem and make a release, for which Hive PMC has no control. On the contrary, Hive community can quickly fix the problem and make a release without waiting for other projects to make a release. I'm not sure this move (ORC as TLP) will be beneficial to vast Hive users. If this not convincing, let me propose that we spin off metastore also as TLP tomorrow! Thanks, Xuefu On Wed, Apr 8, 2015 at 8:33 AM, Owen O'Malley <omal...@apache.org> wrote: > On Tue, Apr 7, 2015 at 8:49 PM, Xuefu Zhang <xzh...@cloudera.com> wrote: > > > If I understood Allen's #2 comment, we are moving existing ORC code out > of > > Hive and make it a separate project, which I definitely missed. > > > > I'm sorry that wasn't clear. Yes, most of the code that is currently in > org.apache.hadoop.hive.ql.io.orc will move to the new project. > > The biggest change on the Hive side will be to create a new Hive module > that defines the API that storage formats like ORC need to code against if > they want high performance integration with Hive's vectorization. I've > started that jira at https://issues.apache.org/jira/browse/HIVE-10171 . > Creating this API should help us create a clean interface for storage > formats that will help ORC and other columnar formats like Trevni or > Parquet. > > Once the ORC project has made its first release, we can create a Hive jira > to replace the Hive ORC code with a reference to the ORC release jar. > > > > Since existing Hive PMC has governance on the code, I would expect it's > > still the case even after the spinoff. > > > > No, Apache doesn't allow umbrella projects where one PMC controls > sub-projects. The reason is that the Apache board has found that > controlling projects directly instead of indirectly through another PMC > reduces the problems. > > .. Owen >