Re: ORC: duplicate record - rowid meaning ?

Peter Vary Tue, 25 Feb 2020 04:45:32 -0800

Hey David,

Good to know it was fixed! :)


Thanks,
Peter

> On Feb 25, 2020, at 12:47, David Morin <morin.david....@gmail.com> wrote:
> 
> Hi Peter,
> 
> Just to give some news concerning my issue.
> The problem is fixed. In fact, it was a reset of rowid in my application 
> because default batch size of my VectorizedRowBatch (ORC) is 1024
> And during the reset of this batch, a reset of rowid was done.
> By now it works as expected
> 
> Thanks
> David
> 
> Le jeu. 6 févr. 2020 à 12:12, David Morin <morin.david....@gmail.com 
> <mailto:morin.david....@gmail.com>> a écrit :
> ok, Peter
> No problem. Thx
> I'll keep you in touch
> 
> On 2020/02/06 09:42:39, Peter Vary <pv...@cloudera.com 
> <mailto:pv...@cloudera.com>> wrote: 
> > Hi David,
> > 
> > I more familiar with ACID v2 :(
> > What I would do is to run an update operation with your version of Hive and 
> > try to see how it handles this case.
> > 
> > Would be nice to hear back from you if you found something.
> > 
> > Thanks,
> > Peter
> > 
> > > On Feb 5, 2020, at 16:55, David Morin <morin.david....@gmail.com 
> > > <mailto:morin.david....@gmail.com>> wrote:
> > > 
> > > Hello,
> > > 
> > > Thanks.
> > > In fact I use HDP 2.6.5 and previous Orc version with transactionid for 
> > > example and the update flag.
> > > Sorry with the row__id iw would have been easier
> > > 
> > > So, Here after the Orc files content (with hive --orcfiledump)
> > > hive --orcfiledump hdfs://XXXX/delta_0198994_0198994_0000/bucket_00000 
> > > {"operation":0,"originalTransaction":198994,"bucket":0,"rowId":14,"currentTransaction":198994,"row":{...}}
> > > 
> > > hive --orcfiledump hdfs://XXXX/delta_0199073_0199073_0000/bucket_00000 
> > > {"operation":1,"originalTransaction":198994,"bucket":0,"rowId":14,"currentTransaction":199073,"row":{...}}
> > > 
> > > => When I execute a SELECT statement on the PK I see 2 records. The link 
> > > is not done.
> > > Still, the meta looks good.
> > > The only difference I see is that in Orc files I mix operations.
> > > When I perform a MERGE statement for example 2 directories have been 
> > > created 
> > > hdfs://XXXX/delta_0199073_0199073_0000
> > > hdfs://XXXX/delta_0199073_0199073_0002
> > > And the first one contains updates (operation:1) and the second one, 
> > > inserts (operation:0)
> > > 
> > > And here after the row__id values
> > > 
> > > (related file: hdfs://XXXX/delta_0198994_0198994_0000/bucket_00000)
> > > row__id    {"transactionid":198994,"bucketid":0,"rowid":14}
> > > ts         1580751316822642886 (5h35)
> > > id         764925
> > > 
> > > (related file: hdfs://XXXX/delta_0199073_0199073_0000/bucket_00000 )
> > > row__id    {"transactionid":198994,"bucketid":0,"rowid":14}
> > > ts         1580757191368914630 (7h13)
> > > id         764925
> > > 
> > > => And I have one duplicated value if I execute a Query based on the id 
> > > (primary key)
> > > 
> > > So I don't understand because the transactionid=originalTransaction and 
> > > all metadatas seems to be good.
> > > Probably a problem in the sort but I follow the rule that data are 
> > > ordered by originalTransaction,bucketId,rowId ascendingly and 
> > > currentTransaction descendingly. It works pretty well except for some 
> > > tables with lot of updates.
> > > The only thing I can see at the moment it is the fact that I mix 
> > > different types of operations in one bucket. The Merge query for example 
> > > create different directories (one per operation)
> > > 
> > > David
> > > 
> > > 
> > > On 2020/02/05 12:22:28, Peter Vary <pv...@cloudera.com 
> > > <mailto:pv...@cloudera.com>> wrote: 
> > >> Hi David,
> > >> 
> > >> There is no tombstone for the updated record.
> > >> In ACID v2 there is no update for the rows. Only insert and delete. So 
> > >> update is handled as delete (old) row, insert (new/independent) row.
> > >> The delete is stored in the delete delta directories., and the file do 
> > >> not have to contain the {row} struct at the end.
> > >> 
> > >> Hope this helps,
> > >> Peter
> > >> 
> > >>> On Feb 5, 2020, at 09:39, David Morin <morin.david....@gmail.com 
> > >>> <mailto:morin.david....@gmail.com>> wrote:
> > >>> 
> > >>> Hi,
> > >>> 
> > >>> It works pretty well but... still problems sometimes occur
> > >>> Do we have to separate operations ?
> > >>> 
> > >>> Here after Orc files content:
> > >>> hive --orcfiledump hdfs://XXXX/delta_0198994_0198994_0000/bucket_00000 
> > >>> {"operation":0,"originalTransaction":198994,"bucket":0,"rowId":14,"currentTransaction":198994,"row":{...}}
> > >>> 
> > >>> hive --orcfiledump hdfs://XXXX/delta_0199073_0199073_0000/bucket_00000 
> > >>> {"operation":1,"originalTransaction":198994,"bucket":0,"rowId":14,"currentTransaction":199073,"row":{...}}
> > >>> 
> > >>> => When I execute a SELECT statement on the PK I see 2 records. The 
> > >>> link is not done.
> > >>> Still, the meta looks good.
> > >>> The only difference I see is that in Orc files I mix operations.
> > >>> When I perform a MERGE statement for example 2 directories have been 
> > >>> created 
> > >>> hdfs://XXXX/delta_0199073_0199073_0000
> > >>> hdfs://XXXX/delta_0199073_0199073_0002
> > >>> And the first one contains updates (operation:1) and the second one, 
> > >>> inserts (operation:0)
> > >>> 
> > >>> Thanks for your help
> > >>> David
> > >>> 
> > >>> On 2019/12/01 16:57:08, David Morin <morin.david....@gmail.com 
> > >>> <mailto:morin.david....@gmail.com>> wrote: 
> > >>>> Hi Peter,
> > >>>> 
> > >>>> At the moment I have a pipeline based on Flink to write Orc Files. 
> > >>>> These Orc Files can be read from Hive thanks to external tables and, 
> > >>>> then, a merge statement (triggered by oozie) push these data into 
> > >>>> tables managed by Hive (transactional tables => ORC). Hive version is 
> > >>>> 2.1 because this is the one provided by HDP 2.6.5.
> > >>>> We've developed a system that write Hive Delta Files for the managed 
> > >>>> tables directly from Flink.
> > >>>> The current streaming apis for Hive 2 are not suitable for our needs 
> > >>>> and we cannot use the new Hive 3 streaming api yet. This system uses 
> > >>>> the Flink state to store Hive metadata (originalTransaction, bucket, 
> > >>>> rowId, ..)
> > >>>> Thanks for your reply because yes, when files are ordered by 
> > >>>> originalTransacion, bucket, rowId
> > >>>> it works ! I just have to use 1 transaction instead of 2 at the moment 
> > >>>> and it will be ok.
> > >>>> 
> > >>>> Thanks
> > >>>> David
> > >>>> 
> > >>>> On 2019/11/29 11:18:05, Peter Vary <pv...@cloudera.com 
> > >>>> <mailto:pv...@cloudera.com>> wrote: 
> > >>>>> Hi David,
> > >>>>> 
> > >>>>> Not entirely sure what you are doing here :), my guess is that you 
> > >>>>> are trying to write ACID tables outside of hive. Am I right? What is 
> > >>>>> the exact use-case? There might be better solutions out there than 
> > >>>>> writing the files by hand.
> > >>>>> 
> > >>>>> As for your question below: Yes, the files should be ordered by: 
> > >>>>> originalTransacion, bucket, rowId triple, otherwise you will get 
> > >>>>> wrong results.
> > >>>>> 
> > >>>>> Thanks,
> > >>>>> Peter
> > >>>>> 
> > >>>>>> On Nov 19, 2019, at 13:30, David Morin <morin.david....@gmail.com 
> > >>>>>> <mailto:morin.david....@gmail.com>> wrote:
> > >>>>>> 
> > >>>>>> here after more details about ORC content and the fact we have 
> > >>>>>> duplicate rows:
> > >>>>>> 
> > >>>>>> /delta_0011365_0011365_0000/bucket_00003
> > >>>>>> 
> > >>>>>> {"operation":0,"originalTransaction":11365,"bucket":3,"rowId":0,"currentTransaction":11365,"row":{"TS":1574156027915254212,"cle":5218,...}}
> > >>>>>> {"operation":0,"originalTransaction":11365,"bucket":3,"rowId":1,"currentTransaction":11365,"row":{"TS":1574156027915075038,"cle":5216,...}}
> > >>>>>> 
> > >>>>>> 
> > >>>>>> /delta_0011368_0011368_0000/bucket_00003
> > >>>>>> 
> > >>>>>> {"operation":2,"originalTransaction":11365,"bucket":3,"rowId":1,"currentTransaction":11368,"row":null}
> > >>>>>> {"operation":2,"originalTransaction":11365,"bucket":3,"rowId":0,"currentTransaction":11368,"row":null}
> > >>>>>> 
> > >>>>>> /delta_0011369_0011369_0000/bucket_00003
> > >>>>>> 
> > >>>>>> {"operation":0,"originalTransaction":11369,"bucket":3,"rowId":1,"currentTransaction":11369,"row":{"TS":1574157407855174144,"cle":5216,...}}
> > >>>>>> {"operation":0,"originalTransaction":11369,"bucket":3,"rowId":0,"currentTransaction":11369,"row":{"TS":1574157407855265906,"cle":5218,...}}
> > >>>>>> 
> > >>>>>> +-------------------------------------------------+-------+--+
> > >>>>>> |                     row__id                     |  cle  |
> > >>>>>> +-------------------------------------------------+-------+--+
> > >>>>>> | {"transactionid":11367,"bucketid":0,"rowid":0}  | 5209  |
> > >>>>>> | {"transactionid":11369,"bucketid":0,"rowid":0}  | 5211  |
> > >>>>>> | {"transactionid":11369,"bucketid":1,"rowid":0}  | 5210  |
> > >>>>>> | {"transactionid":11369,"bucketid":2,"rowid":0}  | 5214  |
> > >>>>>> | {"transactionid":11369,"bucketid":2,"rowid":1}  | 5215  |
> > >>>>>> | {"transactionid":11365,"bucketid":3,"rowid":0}  | 5218  |
> > >>>>>> | {"transactionid":11365,"bucketid":3,"rowid":1}  | 5216  |
> > >>>>>> | {"transactionid":11369,"bucketid":3,"rowid":1}  | 5216  |
> > >>>>>> | {"transactionid":11369,"bucketid":3,"rowid":0}  | 5218  |
> > >>>>>> | {"transactionid":11369,"bucketid":4,"rowid":0}  | 5217  |
> > >>>>>> | {"transactionid":11369,"bucketid":4,"rowid":1}  | 5213  |
> > >>>>>> | {"transactionid":11369,"bucketid":7,"rowid":0}  | 5212  |
> > >>>>>> +-------------------------------------------------+-------+--+
> > >>>>>> 
> > >>>>>> As you can see we have duplicate rows for column "cle" 5216 and 5218
> > >>>>>> Do we have to keep the rowids ordered ? because this is the only 
> > >>>>>> difference I have noticed based on some tests with beeline.
> > >>>>>> 
> > >>>>>> Thanks
> > >>>>>> 
> > >>>>>> 
> > >>>>>> 
> > >>>>>> Le mar. 19 nov. 2019 à 00:18, David Morin <morin.david....@gmail.com 
> > >>>>>> <mailto:morin.david....@gmail.com> <mailto:morin.david....@gmail.com 
> > >>>>>> <mailto:morin.david....@gmail.com>>> a écrit :
> > >>>>>> Hello,
> > >>>>>> 
> > >>>>>> I'm trying to understand the purpose of the rowid column inside ORC 
> > >>>>>> delta file
> > >>>>>> {"transactionid":11359,"bucketid":5,"rowid":0}
> > >>>>>> Orc view: 
> > >>>>>> {"operation":0,"originalTransaction":11359,"bucket":5,"rowId":0,"currentTransaction":11359,"row":...}
> > >>>>>> I use HDP 2.6 => Hive 2
> > >>>>>> 
> > >>>>>> If I want to be idempotent with INSERT / DELETE / INSERT. 
> > >>>>>> Do we have to keep the same rowid ?
> > >>>>>> It seems that when the rowid is changed during the second INSERT I 
> > >>>>>> have a duplicate row.
> > >>>>>> For me, I can create a new rowid for the new transaction during the 
> > >>>>>> second INSERT but that seems to generate duplicate records.
> > >>>>>> 
> > >>>>>> Regards,
> > >>>>>> David
> > >>>>>> 
> > >>>>>> 
> > >>>>>> 
> > >>>>> 
> > >>>>> 
> > >>>> 
> > >> 
> > >> 
> > 
> >

Re: ORC: duplicate record - rowid meaning ?

Reply via email to