only timestamp column value of previous row gets reset

Ujjwal Wed, 27 May 2015 13:20:50 -0700

Hi,



I want to cross check a scenario with you and make sure its not a problem
on my end.


I am trying do to HCatalog read on an edge node and I am seeing a strange
behavior with timestamp data type. My hive version is hive 0.13.0.2



First, this is the way documentation suggests the reading to be. (
https://cwiki.apache.org/confluence/display/Hive/HCatalog+ReaderWriter)



for(InputSplit split : readCntxt.getSplits()){

HCatReader reader = DataTransferFactory.getHCatReader(split,

readerCntxt.getConf());

       Iterator<HCatRecord> itr = reader.read();

       while(itr.hasNext()){

              HCatRecord *read* = itr.next();

          }

}


I am storing the iterator *read* into a buffer for later use in main().
Later I access this iterator from the stored buffer and drain it by
printing out the rows in another thread, and I see the following behavior.



“The column value of data type *timestamp *of a previous row gets
reset to 1*969-12-31
19:00:00.0* when the column value in the current row has *null*. Columns of
other data types in previous row do not get affected by presence of *null*
in its current column value. Also changing the order of columns in source
data doesn’t change the behavior”




hive> describe bug;

dtcol                   date

tscol                   timestamp

stcol                   string

Time taken: 0.058 seconds, Fetched: 3 row(s)

hive> select * from bug;
9779-11-21      2014-04-01 11:30:55     abc
9779-11-21      2014-04-04 11:30:55     def
NULL    NULL


Read in thread - 9779-11-21     2014-04-01 11:30:55.0   abc
Read in thread - 9779-11-21     *1969-12-31 19:00:00.0*   def
Read in thread - null   null


Can this be an issue in Hive timestamp implementation ?

Regards,
Ujjwal

only timestamp column value of previous row gets reset

Reply via email to