Re: ORC: duplicate record - rowid meaning ?

2020-02-05 Thread David Morin
Hello, Thanks. In fact I use HDP 2.6.5 and previous Orc version with transactionid for example and the update flag. Sorry with the row__id iw would have been easier So, Here after the Orc files content (with hive --orcfiledump) hive --orcfiledump hdfs:///delta_0198994_0198994_/bucket_000

Re: ORC: duplicate record - rowid meaning ?

2020-02-05 Thread Peter Vary
Hi David, There is no tombstone for the updated record. In ACID v2 there is no update for the rows. Only insert and delete. So update is handled as delete (old) row, insert (new/independent) row. The delete is stored in the delete delta directories., and the file do not have to contain the {row}

Re: ORC: duplicate record - rowid meaning ?

2020-02-05 Thread David Morin
Hi, It works pretty well but... still problems sometimes occur Do we have to separate operations ? Here after Orc files content: hive --orcfiledump hdfs:///delta_0198994_0198994_/bucket_0 {"operation":0,"originalTransaction":198994,"bucket":0,"rowId":14,"currentTransaction":198994,"r