Re: [DISCUSS] ORC separate project

Sergey Shelukhin Mon, 13 Apr 2015 13:45:12 -0700

IMHO there are 2 separate concerns, forking ORC and Hive using ³new² ORC.
The first one does not really require vote, as discussed on private/board
- anyone can fork part of code (in this case, at least). Then, for Hive
switching to ³new² ORC, I¹m not sure that requires a vote either. We
didn¹t vote when we added Kryo or Spark or Tez dependencyŠ it¹s just a
(big) code change. 3 +1s like a branch merge will be enough, or even one
+1 maybe.


The 2nd concern about fixing issue quickly doesn¹t make sense - it can
happen with any dependency. What if guava or Kryo or Spark or Tez have a
bug? We can still ship Hive as long as the dependency can be updated to
correct version.,

On 15/4/10, 20:05, "Xuefu Zhang" <xzh...@cloudera.com> wrote:

>To Lefty's comment -  Yes, anyone can take Apache code and make another
>project at will. However, for changes made to an existing project as part
>of that process, such as what Owen described for ORC in Hive, it is
>certainly something that Hive PMC can control or vote on. Nevertheless,
>that's not my immediate concern.
>
>To Owen's explanation - Thanks. I guess my major concern is that we
>seemingly are breaking apart Hive's integrity and making it hard to
>release
>and maintain due to increasing number of external dependents. Let's say
>that Hive depends on a certain version of ORC (as TLP) and it's found that
>ORC has a bug that seriously impacts Hive users. We cannot release Hive as
>fast as we can, since dong so would need ORC community to fix the problem
>and make a release, for which Hive PMC has no control. On the contrary,
>Hive community can quickly fix the problem and make a release without
>waiting for other projects to make a release. I'm not sure this move (ORC
>as TLP) will be beneficial to vast Hive users.
>
>If this not convincing, let me propose that we spin off metastore also as
>TLP tomorrow!
>
>Thanks,
>Xuefu
>
>
>On Wed, Apr 8, 2015 at 8:33 AM, Owen O'Malley <omal...@apache.org> wrote:
>
>> On Tue, Apr 7, 2015 at 8:49 PM, Xuefu Zhang <xzh...@cloudera.com> wrote:
>>
>> > If I understood Allen's #2 comment, we are moving existing ORC code
>>out
>> of
>> > Hive and make it a separate project, which I definitely missed.
>> >
>>
>> I'm sorry that wasn't clear. Yes, most of the code that is currently in
>> org.apache.hadoop.hive.ql.io.orc will move to the new project.
>>
>> The biggest change on the Hive side will be to create a new Hive module
>> that defines the API that storage formats like ORC need to code against
>>if
>> they want high performance integration with Hive's vectorization. I've
>> started that jira at https://issues.apache.org/jira/browse/HIVE-10171 .
>> Creating this API should help us create a clean interface for storage
>> formats that will help ORC and other columnar formats like Trevni or
>> Parquet.
>>
>> Once the ORC project has made its first release, we can create a Hive
>>jira
>> to replace the Hive ORC code with a reference to the ORC release jar.
>>
>>
>> > Since existing Hive PMC has governance on the code, I would expect
>>it's
>> > still the case even after the spinoff.
>> >
>>
>> No, Apache doesn't allow umbrella projects where one PMC controls
>> sub-projects. The reason is that the Apache board has found that
>> controlling projects directly instead of indirectly through another PMC
>> reduces the problems.
>>
>> .. Owen
>>

Re: [DISCUSS] ORC separate project

Reply via email to