How is the STORED AS PARQUET used?

Remus Rusanu Wed, 05 Feb 2014 06:29:00 -0800

Hello all,

I tried the following on a build that has the latest HIVE-5783 patch applied 
over trunk:


hive> set 
hive.aux.jars.path=file:///usr/lib/hcatalog/share/hcatalog/hcatalog-core.jar,file:///usr/lib/hive/lib/parquet-hadoop-bundle-1.3.2.jar;
hive> create table alltypes_parquet stored as parquet as select cint, ctinyint, 
csmallint, cdouble, cfloat, cstring1 from alltypesorc;
hive> show create table alltypes_parquet;
OK
CREATE  TABLE `alltypes_parquet`(
  `cint` int COMMENT 'from deserializer',
  `ctinyint` tinyint COMMENT 'from deserializer',
  `csmallint` smallint COMMENT 'from deserializer',
  `cdouble` double COMMENT 'from deserializer',
  `cfloat` float COMMENT 'from deserializer',
  `cstring1` string COMMENT 'from deserializer')
ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
  'hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/alltypes_parquet'
TBLPROPERTIES (
  'numFiles'='1',
  'transient_lastDdlTime'='1391609238',
  'COLUMN_STATS_ACCURATE'='true',
  'totalSize'='256959',
  'numRows'='12288',
  'rawDataSize'='73728')
Time taken: 0.256 seconds, Fetched: 22 row(s)

hive> select * from alltypes_parquet where 1=1;
...
Error:
Caused by: parquet.io.InvalidRecordException: cint not found in message 
table_schema {
}
        at parquet.schema.GroupType.getFieldIndex(GroupType.java:104)
        at parquet.schema.GroupType.getType(GroupType.java:136)
        at 
org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(DataWritableReadSupport.java:93)
        at 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java:205)
        at 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:79)
        at 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:66)
        at 
org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:51)
        at 
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:65)

So what am I missing? The catalog info seems at odds with the record structure 
after CREATE TABLE.

Thanks,
~Remus

PS. alltypesorc is the test ORC table based on data from 
<enlistment>\data\files\alltypesorc

How is the STORED AS PARQUET used?

Reply via email to