I guess I'm echoing previous concerns, in less technical language. (Should have reread the thread before sending.)
-- Lefty On Fri, Apr 3, 2015 at 4:25 PM, Lefty Leverenz <leftylever...@gmail.com> wrote: > Hive users who wished to use ORC would obviously need to pull in ORC >> artifacts in addition to Hive. >> > > What would happen with Hive features that (currently) only work with ORC? > Would they be extended to work with other file formats and stay in Hive? > What about future features -- would they have to work with multiple file > formats from the get-go? > > -- Lefty > > On Fri, Apr 3, 2015 at 3:51 PM, Alan Gates <alanfga...@gmail.com> wrote: > >> A couple of points: >> >> 1) ORC isn't going into the incubator. The proposal before the board is >> for it to go straight to TLP. There's no graduation to depend on. >> 2) As currently proposed Hive would not depend on ORC to build. Hive >> users who wished to used ORC would obviously need to pull in ORC artifacts >> in addition to Hive. Given this I don't think it makes any sense to fork >> ORC and have it in both places. This actually seems the worse outcome, as >> the two will inevitably diverge. >> >> Alan. >> >> Xuefu Zhang <xzh...@cloudera.com> >> April 3, 2015 at 6:41 >> I actually have a different thought to share along the same line. >> >> ORC is not a subproject in Hive. I'm not sure if it's the best we can do >> by >> making a surgery on Hive in order to make ORC a TLP, Not only may this >> bring instability to Hive, but also it also makes Hive depend an >> incubating >> project. Not every project graduates(, though I do wish ORC a success as >> TLP), some of them fail. >> >> Instead, I like the idea of forking Hive ORC as TLP and Hive keeps >> whatever >> it has. This way, the new project can do whatever it wants, and Hive >> community probably doesn't care and has no saying to it. Once ORC as a TLP >> graduates, Hive community can decide whether to go along with it and if so >> how to integrate with it. >> >> I think this will subside the current controversy, help ORC proceed faster >> as a TLP, and leave the decision to the near future. >> >> Thanks, >> Xuefu >> >> >> Szehon Ho <sze...@cloudera.com> >> April 2, 2015 at 23:54 >> I also agree with this goal. >> >> As such, I think we should first see the proposal (JIRA?) for the >> storage-api refactoring and other related work of Orc separating as TLP >> before the actual separation happens, to make sure the separation is not >> done in a way taking us further from this goal. It may very well be this >> refactoring moves us closer to the goal, but seeing the proposal first >> would give a lot of clarity. >> >> Thanks >> Szehon >> >> On Thu, Apr 2, 2015 at 10:20 PM, Edward Capriolo <edlinuxg...@gmail.com> >> <edlinuxg...@gmail.com> >> >> Edward Capriolo <edlinuxg...@gmail.com> >> April 2, 2015 at 22:20 >> To reiterate, one thing I want to avoid is having hive rely on code that >> sits in several tiny silos across Apache projects, or Apache Licensed but >> not ASF projects. Hive is a mature TLP with a large number of committers >> and it would not be a good situation if often work gets bottle necked >> because changes had to be made across two projects simultaneously to >> commit >> a feature. Especially if the two projects do not share the same committer >> list. >> >> I think if could be done perfectly things like ORC, Parquet, whatever >> would >> be <provided> scope dependencies, meaning the project can be built without >> a particular piece but as a hole the project still works. (That might be >> easier said than done :) >> >> >> Nick Dimiduk <ndimi...@gmail.com> >> April 1, 2015 at 11:51 >> I think the storage-api would be very helpful for HBase integration as >> well. >> >> >> Owen O'Malley <omal...@apache.org> >> April 1, 2015 at 11:22 >> >> >> >>> >>> What I'd like to see here is well defined interfaces in Hive so that any >>> storage format that wants can implement them. Hopefully that means things >>> like interfaces and utility classes for acid, sargs, and vectorization move >>> into this new Hive module storage-api. Then Orc, Parquet, etc. can depend >>> on this module without needing to pull in all of Hive. >>> >>> Then Hive contributors would only be forced to make changes in Orc when >>> they want to implement something in Orc. >>> >> >> Agreed. The goal of the new module keep a clean separation between the >> code for ORC and Hive so that vectorization, sargs, and acid are kept in >> Hive and are not moved to or duplicated in the ORC project. >> >> .. Owen >> >> >