[ https://issues.apache.org/jira/browse/HIVE-10830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
lovekesh bansal updated HIVE-10830: ----------------------------------- Description: 1. create external table platdev.table_target ( id INT, message String, state string, date string ) partitioned by (country string) row format delimited fields terminated by ',' stored as RCFILE location '/user/nikgupta/table_target' ; 2. insert overwrite table platdev.table_target partition(country) select case when id=13 then 15 else id end,message,state,date,country from platdev.table_base2 where id between 13 and 16; \n" say now my table is written by default using LazyBinaryColumnarSerDe and has the following data: 15 thirteen delhi 2-12-2014 india 14 fourteen delhi 1-1-2014 india 15 fifteen florida 1-1-2014 us 16 sixteen florida 2-12-2014 us Now If I try to read the data with a mapreduce program, with map function as given below: public void map(LongWritable key, BytesRefArrayWritable val, Context context) throws IOException, InterruptedException { for (int i = 0; i < val.size(); i++) { BytesRefWritable bytesRefread = val.get(i); byte[] currentCell = Arrays.copyOfRange(bytesRefread.getData(), bytesRefread.getStart(), bytesRefread.getStart()+bytesRefread.getLength()); Text currentCellStr = new Text(currentCell); System.out.println("rowText="+currentCellStr ); } context.write(NullWritable.get(), bytes); } and set the following job configuration parameters:- job.setInputFormatClass(RCFileMapReduceInputFormat.class); job.setOutputFormatClass(RCFileMapReduceOutputFormat.class); jobConf.setInt(RCFile.COLUMN_NUMBER_CONF_STR, 5) The output shown is as follows: (LazyBinaryColumnarSerDe) rowText= rowText=fifteen rowText=goa rowText=2-2-2222 rowText=us But exactly the same case using the (ColumnarSerDe) explicitly in the table definition would give the following output: rowText=1 rowText=fifteen rowText=goa rowText=2-2-2222 rowText=us Point is that First column value is missing in the case of LazyBinaryColumnarSerDe. was: 1. create external table platdev.table_target ( id INT, message String, state string, date string ) partitioned by (country string) row format delimited fields terminated by ',' stored as RCFILE location '/user/nikgupta/table_target' ; 2. insert overwrite table platdev.table_target partition(country) select case when id=13 then 15 else id end,message,state,date,country from platdev.table_base2 where id between 13 and 16; \n" say now my table has the following data: 15 thirteen delhi 2-12-2014 india 14 fourteen delhi 1-1-2014 india 15 fifteen florida 1-1-2014 us 16 sixteen florida 2-12-2014 us Now If I try to read the data with a mapreduce program, with map function as given below: public void map(LongWritable key, BytesRefArrayWritable val, Context context) throws IOException, InterruptedException { for (int i = 0; i < val.size(); i++) { BytesRefWritable bytesRefread = val.get(i); byte[] currentCell = Arrays.copyOfRange(bytesRefread.getData(), bytesRefread.getStart(), bytesRefread.getStart()+bytesRefread.getLength()); Text currentCellStr = new Text(currentCell); System.out.println("rowText="+currentCellStr ); } context.write(NullWritable.get(), bytes); } and set the following job configuration parameters:- job.setInputFormatClass(RCFileMapReduceInputFormat.class); job.setOutputFormatClass(RCFileMapReduceOutputFormat.class); jobConf.setInt(RCFile.COLUMN_NUMBER_CONF_STR, 5) The output shown is as follows: rowText= rowText=fifteen rowText=goa rowText=2-2-2222 rowText=us But exactly the same case using the ColumnarSerDe explicitly in the table definition would give the following output: rowText=1 rowText=fifteen rowText=goa rowText=2-2-2222 rowText=us Point is that First column value is missing. > First column of a Hive table created with LazyBinaryColumnarSerDe is not read > properly > -------------------------------------------------------------------------------------- > > Key: HIVE-10830 > URL: https://issues.apache.org/jira/browse/HIVE-10830 > Project: Hive > Issue Type: Bug > Reporter: lovekesh bansal > > 1. create external table platdev.table_target ( id INT, message String, state > string, date string ) partitioned by (country string) row format delimited > fields terminated by ',' stored as RCFILE location > '/user/nikgupta/table_target' ; > 2. insert overwrite table platdev.table_target partition(country) select case > when id=13 then 15 else id end,message,state,date,country from > platdev.table_base2 where id between 13 and 16; \n" > say now my table is written by default using LazyBinaryColumnarSerDe and has > the following data: > 15 thirteen delhi 2-12-2014 india > 14 fourteen delhi 1-1-2014 india > 15 fifteen florida 1-1-2014 us > 16 sixteen florida 2-12-2014 us > Now If I try to read the data with a mapreduce program, with map function as > given below: > public void map(LongWritable key, BytesRefArrayWritable val, Context context) > throws IOException, InterruptedException { > > for (int i = 0; i < val.size(); i++) { > BytesRefWritable bytesRefread = val.get(i); > byte[] currentCell = Arrays.copyOfRange(bytesRefread.getData(), > bytesRefread.getStart(), bytesRefread.getStart()+bytesRefread.getLength()); > Text currentCellStr = new Text(currentCell); > System.out.println("rowText="+currentCellStr ); > } > context.write(NullWritable.get(), bytes); > } > and set the following job configuration parameters:- > job.setInputFormatClass(RCFileMapReduceInputFormat.class); > job.setOutputFormatClass(RCFileMapReduceOutputFormat.class); > jobConf.setInt(RCFile.COLUMN_NUMBER_CONF_STR, 5) > > The output shown is as follows: (LazyBinaryColumnarSerDe) > rowText= > rowText=fifteen > rowText=goa > rowText=2-2-2222 > rowText=us > But exactly the same case using the (ColumnarSerDe) explicitly in the table > definition would give the following output: > rowText=1 > rowText=fifteen > rowText=goa > rowText=2-2-2222 > rowText=us > Point is that First column value is missing in the case of > LazyBinaryColumnarSerDe. -- This message was sent by Atlassian JIRA (v6.3.4#6332)