Re: Parquet support (HIVE-5783)

Brock Noland Tue, 18 Feb 2014 10:33:20 -0800

Hi Alan,

Response is inline, below:


On Tue, Feb 18, 2014 at 11:49 AM, Alan Gates <ga...@hortonworks.com> wrote:
> Gunther, is it the case that there is anything extra that needs to be done to 
> ship Parquet code with Hive right now?  If I read the patch correctly the 
> Parquet jars were added to the pom and thus will be shipped as part of Hive.  
> As long as it works out of the box when a user says "create table ... stored 
> as parquet" why do we care whether the parquet jar is owned by Hive or 
> another project?
>
> The concern about feature mismatch in Parquet versus Hive is valid, but I'm 
> not sure what to do about it other than assure that there are good error 
> messages.  Users will often want to use non-Hive based storage formats 
> (Parquet, Avro, etc.).  This means we need a good way to detect at SQL 
> compile time that the underlying storage doesn't support the indicated data 
> type and throw a good error.

Agreed, the error messages should absolutely be good. I will ensure
this is the case via https://issues.apache.org/jira/browse/HIVE-6457

>
> Also, it's important to be clear going forward about what Hive as a project 
> is signing up for.  If tomorrow someone decides to add a new datatype or 
> feature we need to be clear that we expect the contributor to make this work 
> for Hive owned formats (text, RC, sequence, ORC) but not necessarily for 
> external formats

This makes sense to me.

I'd just like to add that I have a patch available to improve the
hive-exec uber jar and general query speed:
https://issues.apache.org/jira/browse/HIVE-860. Additionally I have a
patch available to finish the generic STORED AS functionality:
https://issues.apache.org/jira/browse/HIVE-5976

Brock

Re: Parquet support (HIVE-5783)

Reply via email to