RE: [DISCUSS] Making storage-api a separately released artifact

2016-08-28 Thread Xu, Cheng A
Pena [mailto:sergio.p...@cloudera.com] Sent: Saturday, August 27, 2016 3:59 AM To: dev Subject: Re: [DISCUSS] Making storage-api a separately released artifact Question: Wouldn't be better to move part of the implementations to Orc, Parquet and Avro, and just have some interfaces and basic implementatio

Re: [DISCUSS] Making storage-api a separately released artifact

2016-08-26 Thread Matthew McCline
face for faster direct object access to the ColumnVector family. From: Sergio Pena Sent: Friday, August 26, 2016 12:58 PM To: dev Subject: Re: [DISCUSS] Making storage-api a separately released artifact Question: Wouldn't be better to move part of the im

Re: [DISCUSS] Making storage-api a separately released artifact

2016-08-26 Thread Sergio Pena
Question: Wouldn't be better to move part of the implementations to Orc, Parquet and Avro, and just have some interfaces and basic implementations on Hive? This way we could avoid Orc, Parquet and/or Avro to depend from Hive. I saw this on Parquet where they created a RowBatch class internally and

Re: [DISCUSS] Making storage-api a separately released artifact

2016-08-19 Thread Lefty Leverenz
Sergey's idea is creative, although it leads to confusion about JIRA fix versions. Issues would be given fix versions based on assumptions about whether SA or Hive will be released first. (That's hard to predict when it's months away.) Keeping the version numbers tied together is very appealing.

Re: [DISCUSS] Making storage-api a separately released artifact

2016-08-19 Thread Sergey Shelukhin
I am suggesting we always skip the number. So only one component gets the next one :) In your example Hive trunk would be 2.3, and if SA is released again it would become 2.4. Otherwise we’d need a compat table cause versions will be totally out of sync. On 16/8/19, 16:31, "Owen O'Malley" wrote:

Re: [DISCUSS] Making storage-api a separately released artifact

2016-08-19 Thread Owen O'Malley
That won't necessarily work, especially in the beginning. If we release SA 2.2.0 and use it for Hive trunk with the assumption that the next Hive release will be 2.2. What do we do when we need to make an incompatible change in SA? I guess we could release SA as 2.3.0 and when hive makes its next r

Re: [DISCUSS] Making storage-api a separately released artifact

2016-08-19 Thread Sergey Shelukhin
Can we just run the versions thru? I.e. increment it every time but release only one component (or both if they happen to align I guess). E.g. storage-api will be released at 2.2, and say 2.3 if it moves fast, then Hive 2.4, then storage-api 2.5, etc. That might make it easier to reason about compa

Re: [DISCUSS] Making storage-api a separately released artifact

2016-08-19 Thread Sergio Pena
I see Parquet is currently using the SearchArgument class for predicates push down. Will this class be part of the new sub-module or project? Following Sushanth idea, can we have other API interfaces in the new project that other components can use? Perhaps having this may be a good reason to crea

Re: [DISCUSS] Making storage-api a separately released artifact

2016-08-17 Thread Owen O'Malley
On Wed, Aug 17, 2016 at 10:46 AM, Alan Gates wrote: > +1 for making the API clean and easy for other projects to work with. A > few questions: > > 1) Would this also make it easier for Parquet and others to implement > Hive’s ACID interfaces? > Currently the ACID interfaces haven't been moved o

Re: [DISCUSS] Making storage-api a separately released artifact

2016-08-17 Thread Sushanth Sowmyan
+1 for having a separate storage-api project to define common interfaces for people to develop against. It'll make things much easier to develop against generically. I'm okay(+0) with the sub-project idea as opposed to enthusiastic about it, mostly because I have reservations that it'll encourage

Re: [DISCUSS] Making storage-api a separately released artifact

2016-08-17 Thread Prasanth Jayachandran
+1 for making it a subproject with separate (preferably shorter) release cycle. The module in itself is too small for a separate project. Also having a faster release cycle will resolve circular dependency and will help other projects make use of vectorization, sarg, bloom filter etc. For versi

Re: [DISCUSS] Making storage-api a separately released artifact

2016-08-17 Thread Alan Gates
+1 for making the API clean and easy for other projects to work with. A few questions: 1) Would this also make it easier for Parquet and others to implement Hive’s ACID interfaces? 2) Would we make any attempt to coordinate version numbers between Hive and the storage module, or would a given

[DISCUSS] Making storage-api a separately released artifact

2016-08-15 Thread Owen O'Malley
All, As part of moving ORC out of Hive, we pulled all of the vectorization storage and sarg classes into a separate module, which is named storage-api. Although it is currently only used by ORC, it could be used by Parquet or Avro if they wanted to make a fast vectorized reader that read directly