Some of these issues can be addressed in the documentation. The "File Formats" section of the Language Manual needs an overview, and that might be a good place to explain the differences between Hive-owned formats and external formats. Or the SerDe doc could be beefed up: Built-In SerDes<https://cwiki.apache.org/confluence/display/Hive/SerDe#SerDe-Built-inSerDes> .
In the meantime, I've added a link to the Avro doc in the "File Formats" list and mentioned Parquet in DDL's Row Format, Storage Format, and SerDe<https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RowFormat,StorageFormat,andSerDe>section: Use STORED AS PARQUET (without ROW FORMAT SERDE) for the Parquet<https://cwiki.apache.org/confluence/display/Hive/Parquet> columnar > storage format in Hive 0.13.0 and > later<https://cwiki.apache.org/confluence/display/Hive/Parquet#Parquet-Hive0.13andlater>; > or use ROW FORMAT SERDE ... STORED AS INPUTFORMAT ... OUTPUTFORMAT ... in Hive > 0.10, 0.11, or > 0.12<https://cwiki.apache.org/confluence/display/Hive/Parquet#Parquet-Hive0.10-0.12> > . Does that work? -- Lefty On Tue, Feb 18, 2014 at 1:31 PM, Brock Noland <br...@cloudera.com> wrote: > Hi Alan, > > Response is inline, below: > > On Tue, Feb 18, 2014 at 11:49 AM, Alan Gates <ga...@hortonworks.com> > wrote: > > Gunther, is it the case that there is anything extra that needs to be > done to ship Parquet code with Hive right now? If I read the patch > correctly the Parquet jars were added to the pom and thus will be shipped > as part of Hive. As long as it works out of the box when a user says > "create table ... stored as parquet" why do we care whether the parquet jar > is owned by Hive or another project? > > > > The concern about feature mismatch in Parquet versus Hive is valid, but > I'm not sure what to do about it other than assure that there are good > error messages. Users will often want to use non-Hive based storage > formats (Parquet, Avro, etc.). This means we need a good way to detect at > SQL compile time that the underlying storage doesn't support the indicated > data type and throw a good error. > > Agreed, the error messages should absolutely be good. I will ensure > this is the case via https://issues.apache.org/jira/browse/HIVE-6457 > > > > > Also, it's important to be clear going forward about what Hive as a > project is signing up for. If tomorrow someone decides to add a new > datatype or feature we need to be clear that we expect the contributor to > make this work for Hive owned formats (text, RC, sequence, ORC) but not > necessarily for external formats > > This makes sense to me. > > I'd just like to add that I have a patch available to improve the > hive-exec uber jar and general query speed: > https://issues.apache.org/jira/browse/HIVE-860. Additionally I have a > patch available to finish the generic STORED AS functionality: > https://issues.apache.org/jira/browse/HIVE-5976 > > Brock >