Hi, I have uploaded few csv files from windows into hive and configured few external tables using them. When I am trying to run a join on two tables one of the int columns get changed to 0. The structure of the tables are as follows:
Table-1 Table-2 ------------ ----------- Id(int) id(int) datetime eid(int) -- ---- ------------ ----- 1 1 2011-02-01 3 2 1 2011-03-01 4 3 2 2011-04-01 5 4 2011-05-01 6 6 2011-06-01 7 The join query is - select a.* from Table-2 a join Table-1 b on (a.id=b.id); The output is: 1 2011-02-01 0 1 2011-03-01 0 2 2011-04-01 0 I checked the logs and noticed the following warning : WARN org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct: Extra bytes detected at the end of the row! Ignoring similar problems.Could this be causing it ? When I turn on hive.auto.convert.join=true , the error goes away as there is no reduce phase.The output is: 1 2011-02-01 3 1 2011-03-01 4 2 2011-04-01 5 Could somebody please help me figure out why we get the wrong results when running through the reducer. -- Thanks