issue while reading parquet file in hive

Santlal J Gupta Mon, 20 Jul 2015 23:55:29 -0700

Hello,



I have following issue.



I have created parquet file through cascading parquet  and want to  load into 
the hive table.

My datafile contain data of type timestamp.

Cascading parquet does not  support  timestamp data type , so while creating 
parquet file I have given as binary type. After generating parquet file , this  
Parquet file is loaded successfully in the hive .



While creating hive table I have given the column type as timestamp.



Code :



package com.parquet.TimestampTest;



import cascading.flow.FlowDef;

import cascading.flow.hadoop.HadoopFlowConnector;

import cascading.pipe.Pipe;

import cascading.scheme.Scheme;

import cascading.scheme.hadoop.TextDelimited;

import cascading.tap.SinkMode;

import cascading.tap.Tap;

import cascading.tap.hadoop.Hfs;

import cascading.tuple.Fields;

import parquet.cascading.ParquetTupleScheme;



public class GenrateTimeStampParquetFile {

                static String inputPath = "target/input/timestampInputFile1";

                static String outputPath = 
"target/parquetOutput/TimestampOutput";



                public static void main(String[] args) {



                                write();

                }



                private static void write() {

                                // TODO Auto-generated method stub



                                Fields field = new 
Fields("timestampField").applyTypes(String.class);

                                Scheme sourceSch = new TextDelimited(field, 
false, "\n");



                                Fields outputField = new 
Fields("timestampField");



                                Scheme sinkSch = new ParquetTupleScheme(field, 
outputField,

                                                                "message 
TimeStampTest{optional binary timestampField ;}");



                                Tap source = new Hfs(sourceSch, inputPath);

                                Tap sink = new Hfs(sinkSch, outputPath, 
SinkMode.REPLACE);



                                Pipe pipe = new Pipe("Hive timestamp");



                                FlowDef fd = FlowDef.flowDef().addSource(pipe, 
source).addTailSink(pipe, sink);



                                new 
HadoopFlowConnector().connect(fd).complete();

                }

}



Input file:



timestampInputFile1



timestampField

1988-05-25 15:15:15.254

1987-05-06 14:14:25.362



After running the code following files are generated.

Output :

1. part-00000-m-00000.parquet

2. _SUCCESS

3. _metadata

4. _common_metadata



I have created the table in hive to load the  part-00000-m-00000.parquet file.



I have written following query in the hive.

Query :



hive> create table test3(timestampField timestamp) stored as parquet;

hive> load data local inpath  
'/home/hduser/parquet_testing/part-00000-m-00000.parquet' into table test3;

hive> select  * from test3;



After running above command I got following as output.



Output :



OK

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".

SLF4J: Defaulting to no-operation (NOP) logger implementation

SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.

Failed with exception 
java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.ClassCastException: org.apache.hadoop.io.BytesWritable cannot be cast 
to org.apache.hadoop.hive.serde2.io.TimestampWritable





But I have got above exception.



So please help me to solve this problem.



Currently I am using

    Hive 1.1.0-cdh5.4.2.

   Cascading 2.5.1

   parquet-format-2.2.0



Thanks

Santlal J. Gupta





**************************************Disclaimer******************************************
 This e-mail message and any attachments may contain confidential information 
and is for the sole use of the intended recipient(s) only. Any views or 
opinions presented or implied are solely those of the author and do not 
necessarily represent the views of BitWise. If you are not the intended 
recipient(s), you are hereby notified that disclosure, printing, copying, 
forwarding, distribution, or the taking of any action whatsoever in reliance on 
the contents of this electronic information is strictly prohibited. If you have 
received this e-mail message in error, please immediately notify the sender and 
delete the electronic message and any attachments.BitWise does not accept 
liability for any virus introduced by this e-mail or any attachments. 
********************************************************************************************

issue while reading parquet file in hive

Reply via email to