Re: Parquet support (HIVE-5783)

Lefty Leverenz Thu, 20 Feb 2014 16:38:24 -0800

Some of these issues can be addressed in the documentation.  The "File
Formats" section of the Language Manual needs an overview, and that might
be a good place to explain the differences between Hive-owned formats and
external formats.  Or the SerDe doc could be beefed up:  Built-In
SerDes<https://cwiki.apache.org/confluence/display/Hive/SerDe#SerDe-Built-inSerDes>
.


In the meantime, I've added a link to the Avro doc in the "File Formats"
list and mentioned Parquet in DDL's Row Format, Storage Format, and
SerDe<https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RowFormat,StorageFormat,andSerDe>section:

Use STORED AS PARQUET (without ROW FORMAT SERDE) for the
Parquet<https://cwiki.apache.org/confluence/display/Hive/Parquet>
columnar
> storage format in Hive 0.13.0 and 
> later<https://cwiki.apache.org/confluence/display/Hive/Parquet#Parquet-Hive0.13andlater>;
> or use ROW FORMAT SERDE ... STORED AS INPUTFORMAT ... OUTPUTFORMAT ... in Hive
> 0.10, 0.11, or 
> 0.12<https://cwiki.apache.org/confluence/display/Hive/Parquet#Parquet-Hive0.10-0.12>
> .


Does that work?

-- Lefty


On Tue, Feb 18, 2014 at 1:31 PM, Brock Noland <br...@cloudera.com> wrote:

> Hi Alan,
>
> Response is inline, below:
>
> On Tue, Feb 18, 2014 at 11:49 AM, Alan Gates <ga...@hortonworks.com>
> wrote:
> > Gunther, is it the case that there is anything extra that needs to be
> done to ship Parquet code with Hive right now?  If I read the patch
> correctly the Parquet jars were added to the pom and thus will be shipped
> as part of Hive.  As long as it works out of the box when a user says
> "create table ... stored as parquet" why do we care whether the parquet jar
> is owned by Hive or another project?
> >
> > The concern about feature mismatch in Parquet versus Hive is valid, but
> I'm not sure what to do about it other than assure that there are good
> error messages.  Users will often want to use non-Hive based storage
> formats (Parquet, Avro, etc.).  This means we need a good way to detect at
> SQL compile time that the underlying storage doesn't support the indicated
> data type and throw a good error.
>
> Agreed, the error messages should absolutely be good. I will ensure
> this is the case via https://issues.apache.org/jira/browse/HIVE-6457
>
> >
> > Also, it's important to be clear going forward about what Hive as a
> project is signing up for.  If tomorrow someone decides to add a new
> datatype or feature we need to be clear that we expect the contributor to
> make this work for Hive owned formats (text, RC, sequence, ORC) but not
> necessarily for external formats
>
> This makes sense to me.
>
> I'd just like to add that I have a patch available to improve the
> hive-exec uber jar and general query speed:
> https://issues.apache.org/jira/browse/HIVE-860. Additionally I have a
> patch available to finish the generic STORED AS functionality:
> https://issues.apache.org/jira/browse/HIVE-5976
>
> Brock
>

Re: Parquet support (HIVE-5783)

Reply via email to