IMHO there are 2 separate concerns, forking ORC and Hive using ³new² ORC. The first one does not really require vote, as discussed on private/board - anyone can fork part of code (in this case, at least). Then, for Hive switching to ³new² ORC, I¹m not sure that requires a vote either. We didn¹t vote when we added Kryo or Spark or Tez dependencyŠ it¹s just a (big) code change. 3 +1s like a branch merge will be enough, or even one +1 maybe.
The 2nd concern about fixing issue quickly doesn¹t make sense - it can happen with any dependency. What if guava or Kryo or Spark or Tez have a bug? We can still ship Hive as long as the dependency can be updated to correct version., On 15/4/10, 20:05, "Xuefu Zhang" <xzh...@cloudera.com> wrote: >To Lefty's comment - Yes, anyone can take Apache code and make another >project at will. However, for changes made to an existing project as part >of that process, such as what Owen described for ORC in Hive, it is >certainly something that Hive PMC can control or vote on. Nevertheless, >that's not my immediate concern. > >To Owen's explanation - Thanks. I guess my major concern is that we >seemingly are breaking apart Hive's integrity and making it hard to >release >and maintain due to increasing number of external dependents. Let's say >that Hive depends on a certain version of ORC (as TLP) and it's found that >ORC has a bug that seriously impacts Hive users. We cannot release Hive as >fast as we can, since dong so would need ORC community to fix the problem >and make a release, for which Hive PMC has no control. On the contrary, >Hive community can quickly fix the problem and make a release without >waiting for other projects to make a release. I'm not sure this move (ORC >as TLP) will be beneficial to vast Hive users. > >If this not convincing, let me propose that we spin off metastore also as >TLP tomorrow! > >Thanks, >Xuefu > > >On Wed, Apr 8, 2015 at 8:33 AM, Owen O'Malley <omal...@apache.org> wrote: > >> On Tue, Apr 7, 2015 at 8:49 PM, Xuefu Zhang <xzh...@cloudera.com> wrote: >> >> > If I understood Allen's #2 comment, we are moving existing ORC code >>out >> of >> > Hive and make it a separate project, which I definitely missed. >> > >> >> I'm sorry that wasn't clear. Yes, most of the code that is currently in >> org.apache.hadoop.hive.ql.io.orc will move to the new project. >> >> The biggest change on the Hive side will be to create a new Hive module >> that defines the API that storage formats like ORC need to code against >>if >> they want high performance integration with Hive's vectorization. I've >> started that jira at https://issues.apache.org/jira/browse/HIVE-10171 . >> Creating this API should help us create a clean interface for storage >> formats that will help ORC and other columnar formats like Trevni or >> Parquet. >> >> Once the ORC project has made its first release, we can create a Hive >>jira >> to replace the Hive ORC code with a reference to the ORC release jar. >> >> >> > Since existing Hive PMC has governance on the code, I would expect >>it's >> > still the case even after the spinoff. >> > >> >> No, Apache doesn't allow umbrella projects where one PMC controls >> sub-projects. The reason is that the Apache board has found that >> controlling projects directly instead of indirectly through another PMC >> reduces the problems. >> >> .. Owen >>