Thanks. I used INSERT ... SELECT instead and works fine. -----Original Message----- From: Brock Noland [mailto:br...@cloudera.com] Sent: Wednesday, February 05, 2014 4:46 PM To: dev@hive.apache.org Subject: Re: How is the STORED AS PARQUET used?
Hi, CTAS needs to be implemented for Parquet + Hive. There are more details here: https://issues.apache.org/jira/browse/HIVE-6375 For a basic guide, I'd look at the following files in the patch: parquet_partitioned.q and parquet_create.q I have working on the Parquet documentation on my calendar for Thursday/Friday. Brock On Wed, Feb 5, 2014 at 8:27 AM, Remus Rusanu <rem...@microsoft.com> wrote: > Hello all, > > I tried the following on a build that has the latest HIVE-5783 patch applied > over trunk: > > hive> set > hive> hive.aux.jars.path=file:///usr/lib/hcatalog/share/hcatalog/hcata > hive> log-core.jar,file:///usr/lib/hive/lib/parquet-hadoop-bundle-1.3. > hive> 2.jar; create table alltypes_parquet stored as parquet as select > hive> cint, ctinyint, csmallint, cdouble, cfloat, cstring1 from > hive> alltypesorc; show create table alltypes_parquet; > OK > CREATE TABLE `alltypes_parquet`( > `cint` int COMMENT 'from deserializer', > `ctinyint` tinyint COMMENT 'from deserializer', > `csmallint` smallint COMMENT 'from deserializer', > `cdouble` double COMMENT 'from deserializer', > `cfloat` float COMMENT 'from deserializer', > `cstring1` string COMMENT 'from deserializer') ROW FORMAT SERDE > 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' > LOCATION > 'hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/alltypes_parquet' > TBLPROPERTIES ( > 'numFiles'='1', > 'transient_lastDdlTime'='1391609238', > 'COLUMN_STATS_ACCURATE'='true', > 'totalSize'='256959', > 'numRows'='12288', > 'rawDataSize'='73728') > Time taken: 0.256 seconds, Fetched: 22 row(s) > > hive> select * from alltypes_parquet where 1=1; > ... > Error: > Caused by: parquet.io.InvalidRecordException: cint not found in > message table_schema { } > at parquet.schema.GroupType.getFieldIndex(GroupType.java:104) > at parquet.schema.GroupType.getType(GroupType.java:136) > at > org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(DataWritableReadSupport.java:93) > at > org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java:205) > at > org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:79) > at > org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:66) > at > org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:51) > at > org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiv > eRecordReader.java:65) > > So what am I missing? The catalog info seems at odds with the record > structure after CREATE TABLE. > > Thanks, > ~Remus > > PS. alltypesorc is the test ORC table based on data from > <enlistment>\data\files\alltypesorc -- Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org