I guess I'm echoing previous concerns, in less technical language.  (Should
have reread the thread before sending.)

-- Lefty

On Fri, Apr 3, 2015 at 4:25 PM, Lefty Leverenz <leftylever...@gmail.com>
wrote:

> Hive users who wished to use ORC would obviously need to pull in ORC
>> artifacts in addition to Hive.
>>
>
> What would happen with Hive features that (currently) only work with ORC?
> Would they be extended to work with other file formats and stay in Hive?
> What about future features -- would they have to work with multiple file
> formats from the get-go?
>
> -- Lefty
>
> On Fri, Apr 3, 2015 at 3:51 PM, Alan Gates <alanfga...@gmail.com> wrote:
>
>> A couple of points:
>>
>> 1) ORC isn't going into the incubator.  The proposal before the board is
>> for it to go straight to TLP.  There's no graduation to depend on.
>> 2) As currently proposed Hive would not depend on ORC to build.  Hive
>> users who wished to used ORC would obviously need to pull in ORC artifacts
>> in addition to Hive.  Given this I don't think it makes any sense to fork
>> ORC and have it in both places.  This actually seems the worse outcome, as
>> the two will inevitably diverge.
>>
>> Alan.
>>
>>   Xuefu Zhang <xzh...@cloudera.com>
>>  April 3, 2015 at 6:41
>> I actually have a different thought to share along the same line.
>>
>> ORC is not a subproject in Hive. I'm not sure if it's the best we can do
>> by
>> making a surgery on Hive in order to make ORC a TLP, Not only may this
>> bring instability to Hive, but also it also makes Hive depend an
>> incubating
>> project. Not every project graduates(, though I do wish ORC a success as
>> TLP), some of them fail.
>>
>> Instead, I like the idea of forking Hive ORC as TLP and Hive keeps
>> whatever
>> it has. This way, the new project can do whatever it wants, and Hive
>> community probably doesn't care and has no saying to it. Once ORC as a TLP
>> graduates, Hive community can decide whether to go along with it and if so
>> how to integrate with it.
>>
>> I think this will subside the current controversy, help ORC proceed faster
>> as a TLP, and leave the decision to the near future.
>>
>> Thanks,
>> Xuefu
>>
>>
>>   Szehon Ho <sze...@cloudera.com>
>>  April 2, 2015 at 23:54
>> I also agree with this goal.
>>
>> As such, I think we should first see the proposal (JIRA?) for the
>> storage-api refactoring and other related work of Orc separating as TLP
>> before the actual separation happens, to make sure the separation is not
>> done in a way taking us further from this goal. It may very well be this
>> refactoring moves us closer to the goal, but seeing the proposal first
>> would give a lot of clarity.
>>
>> Thanks
>> Szehon
>>
>> On Thu, Apr 2, 2015 at 10:20 PM, Edward Capriolo <edlinuxg...@gmail.com>
>> <edlinuxg...@gmail.com>
>>
>>   Edward Capriolo <edlinuxg...@gmail.com>
>>  April 2, 2015 at 22:20
>> To reiterate, one thing I want to avoid is having hive rely on code that
>> sits in several tiny silos across Apache projects, or Apache Licensed but
>> not ASF projects. Hive is a mature TLP with a large number of committers
>> and it would not be a good situation if often work gets bottle necked
>> because changes had to be made across two projects simultaneously to
>> commit
>> a feature. Especially if the two projects do not share the same committer
>> list.
>>
>> I think if could be done perfectly things like ORC, Parquet, whatever
>> would
>> be <provided> scope dependencies, meaning the project can be built without
>> a particular piece but as a hole the project still works. (That might be
>> easier said than done :)
>>
>>
>>   Nick Dimiduk <ndimi...@gmail.com>
>>  April 1, 2015 at 11:51
>> I think the storage-api would be very helpful for HBase integration as
>> well.
>>
>>
>>   Owen O'Malley <omal...@apache.org>
>>  April 1, 2015 at 11:22
>>
>>
>>
>>>
>>> What I'd like to see here is well defined interfaces in Hive so that any
>>> storage format that wants can implement them.  Hopefully that means things
>>> like interfaces and utility classes for acid, sargs, and vectorization move
>>> into this new Hive module storage-api.  Then Orc, Parquet, etc. can depend
>>> on this module without needing to pull in all of Hive.
>>>
>>> Then Hive contributors would only be forced to make changes in Orc when
>>> they want to implement something in Orc.
>>>
>>
>> Agreed. The goal of the new module keep a clean separation between the
>> code for ORC and Hive so that vectorization, sargs, and acid are kept in
>> Hive and are not moved to or duplicated in the ORC project.
>>
>> .. Owen
>>
>>
>

Reply via email to