Re: ORC: duplicate record - rowid meaning ?

Peter Vary Thu, 06 Feb 2020 01:43:16 -0800

Hi David,

I more familiar with ACID v2 :(
What I would do is to run an update operation with your version of Hive and try 
to see how it handles this case.


Would be nice to hear back from you if you found something.

Thanks,
Peter

> On Feb 5, 2020, at 16:55, David Morin <morin.david....@gmail.com> wrote:
> 
> Hello,
> 
> Thanks.
> In fact I use HDP 2.6.5 and previous Orc version with transactionid for 
> example and the update flag.
> Sorry with the row__id iw would have been easier
> 
> So, Here after the Orc files content (with hive --orcfiledump)
> hive --orcfiledump hdfs://XXXX/delta_0198994_0198994_0000/bucket_00000 
> {"operation":0,"originalTransaction":198994,"bucket":0,"rowId":14,"currentTransaction":198994,"row":{...}}
> 
> hive --orcfiledump hdfs://XXXX/delta_0199073_0199073_0000/bucket_00000 
> {"operation":1,"originalTransaction":198994,"bucket":0,"rowId":14,"currentTransaction":199073,"row":{...}}
> 
> => When I execute a SELECT statement on the PK I see 2 records. The link is 
> not done.
> Still, the meta looks good.
> The only difference I see is that in Orc files I mix operations.
> When I perform a MERGE statement for example 2 directories have been created 
> hdfs://XXXX/delta_0199073_0199073_0000
> hdfs://XXXX/delta_0199073_0199073_0002
> And the first one contains updates (operation:1) and the second one, inserts 
> (operation:0)
> 
> And here after the row__id values
> 
> (related file: hdfs://XXXX/delta_0198994_0198994_0000/bucket_00000)
> row__id    {"transactionid":198994,"bucketid":0,"rowid":14}
> ts         1580751316822642886 (5h35)
> id         764925
> 
> (related file: hdfs://XXXX/delta_0199073_0199073_0000/bucket_00000 )
> row__id    {"transactionid":198994,"bucketid":0,"rowid":14}
> ts         1580757191368914630 (7h13)
> id         764925
> 
> => And I have one duplicated value if I execute a Query based on the id 
> (primary key)
> 
> So I don't understand because the transactionid=originalTransaction and all 
> metadatas seems to be good.
> Probably a problem in the sort but I follow the rule that data are ordered by 
> originalTransaction,bucketId,rowId ascendingly and currentTransaction 
> descendingly. It works pretty well except for some tables with lot of updates.
> The only thing I can see at the moment it is the fact that I mix different 
> types of operations in one bucket. The Merge query for example create 
> different directories (one per operation)
> 
> David
> 
> 
> On 2020/02/05 12:22:28, Peter Vary <pv...@cloudera.com> wrote: 
>> Hi David,
>> 
>> There is no tombstone for the updated record.
>> In ACID v2 there is no update for the rows. Only insert and delete. So 
>> update is handled as delete (old) row, insert (new/independent) row.
>> The delete is stored in the delete delta directories., and the file do not 
>> have to contain the {row} struct at the end.
>> 
>> Hope this helps,
>> Peter
>> 
>>> On Feb 5, 2020, at 09:39, David Morin <morin.david....@gmail.com> wrote:
>>> 
>>> Hi,
>>> 
>>> It works pretty well but... still problems sometimes occur
>>> Do we have to separate operations ?
>>> 
>>> Here after Orc files content:
>>> hive --orcfiledump hdfs://XXXX/delta_0198994_0198994_0000/bucket_00000 
>>> {"operation":0,"originalTransaction":198994,"bucket":0,"rowId":14,"currentTransaction":198994,"row":{...}}
>>> 
>>> hive --orcfiledump hdfs://XXXX/delta_0199073_0199073_0000/bucket_00000 
>>> {"operation":1,"originalTransaction":198994,"bucket":0,"rowId":14,"currentTransaction":199073,"row":{...}}
>>> 
>>> => When I execute a SELECT statement on the PK I see 2 records. The link is 
>>> not done.
>>> Still, the meta looks good.
>>> The only difference I see is that in Orc files I mix operations.
>>> When I perform a MERGE statement for example 2 directories have been 
>>> created 
>>> hdfs://XXXX/delta_0199073_0199073_0000
>>> hdfs://XXXX/delta_0199073_0199073_0002
>>> And the first one contains updates (operation:1) and the second one, 
>>> inserts (operation:0)
>>> 
>>> Thanks for your help
>>> David
>>> 
>>> On 2019/12/01 16:57:08, David Morin <morin.david....@gmail.com> wrote: 
>>>> Hi Peter,
>>>> 
>>>> At the moment I have a pipeline based on Flink to write Orc Files. These 
>>>> Orc Files can be read from Hive thanks to external tables and, then, a 
>>>> merge statement (triggered by oozie) push these data into tables managed 
>>>> by Hive (transactional tables => ORC). Hive version is 2.1 because this is 
>>>> the one provided by HDP 2.6.5.
>>>> We've developed a system that write Hive Delta Files for the managed 
>>>> tables directly from Flink.
>>>> The current streaming apis for Hive 2 are not suitable for our needs and 
>>>> we cannot use the new Hive 3 streaming api yet. This system uses the Flink 
>>>> state to store Hive metadata (originalTransaction, bucket, rowId, ..)
>>>> Thanks for your reply because yes, when files are ordered by 
>>>> originalTransacion, bucket, rowId
>>>> it works ! I just have to use 1 transaction instead of 2 at the moment and 
>>>> it will be ok.
>>>> 
>>>> Thanks
>>>> David
>>>> 
>>>> On 2019/11/29 11:18:05, Peter Vary <pv...@cloudera.com> wrote: 
>>>>> Hi David,
>>>>> 
>>>>> Not entirely sure what you are doing here :), my guess is that you are 
>>>>> trying to write ACID tables outside of hive. Am I right? What is the 
>>>>> exact use-case? There might be better solutions out there than writing 
>>>>> the files by hand.
>>>>> 
>>>>> As for your question below: Yes, the files should be ordered by: 
>>>>> originalTransacion, bucket, rowId triple, otherwise you will get wrong 
>>>>> results.
>>>>> 
>>>>> Thanks,
>>>>> Peter
>>>>> 
>>>>>> On Nov 19, 2019, at 13:30, David Morin <morin.david....@gmail.com> wrote:
>>>>>> 
>>>>>> here after more details about ORC content and the fact we have duplicate 
>>>>>> rows:
>>>>>> 
>>>>>> /delta_0011365_0011365_0000/bucket_00003
>>>>>> 
>>>>>> {"operation":0,"originalTransaction":11365,"bucket":3,"rowId":0,"currentTransaction":11365,"row":{"TS":1574156027915254212,"cle":5218,...}}
>>>>>> {"operation":0,"originalTransaction":11365,"bucket":3,"rowId":1,"currentTransaction":11365,"row":{"TS":1574156027915075038,"cle":5216,...}}
>>>>>> 
>>>>>> 
>>>>>> /delta_0011368_0011368_0000/bucket_00003
>>>>>> 
>>>>>> {"operation":2,"originalTransaction":11365,"bucket":3,"rowId":1,"currentTransaction":11368,"row":null}
>>>>>> {"operation":2,"originalTransaction":11365,"bucket":3,"rowId":0,"currentTransaction":11368,"row":null}
>>>>>> 
>>>>>> /delta_0011369_0011369_0000/bucket_00003
>>>>>> 
>>>>>> {"operation":0,"originalTransaction":11369,"bucket":3,"rowId":1,"currentTransaction":11369,"row":{"TS":1574157407855174144,"cle":5216,...}}
>>>>>> {"operation":0,"originalTransaction":11369,"bucket":3,"rowId":0,"currentTransaction":11369,"row":{"TS":1574157407855265906,"cle":5218,...}}
>>>>>> 
>>>>>> +-------------------------------------------------+-------+--+
>>>>>> |                     row__id                     |  cle  |
>>>>>> +-------------------------------------------------+-------+--+
>>>>>> | {"transactionid":11367,"bucketid":0,"rowid":0}  | 5209  |
>>>>>> | {"transactionid":11369,"bucketid":0,"rowid":0}  | 5211  |
>>>>>> | {"transactionid":11369,"bucketid":1,"rowid":0}  | 5210  |
>>>>>> | {"transactionid":11369,"bucketid":2,"rowid":0}  | 5214  |
>>>>>> | {"transactionid":11369,"bucketid":2,"rowid":1}  | 5215  |
>>>>>> | {"transactionid":11365,"bucketid":3,"rowid":0}  | 5218  |
>>>>>> | {"transactionid":11365,"bucketid":3,"rowid":1}  | 5216  |
>>>>>> | {"transactionid":11369,"bucketid":3,"rowid":1}  | 5216  |
>>>>>> | {"transactionid":11369,"bucketid":3,"rowid":0}  | 5218  |
>>>>>> | {"transactionid":11369,"bucketid":4,"rowid":0}  | 5217  |
>>>>>> | {"transactionid":11369,"bucketid":4,"rowid":1}  | 5213  |
>>>>>> | {"transactionid":11369,"bucketid":7,"rowid":0}  | 5212  |
>>>>>> +-------------------------------------------------+-------+--+
>>>>>> 
>>>>>> As you can see we have duplicate rows for column "cle" 5216 and 5218
>>>>>> Do we have to keep the rowids ordered ? because this is the only 
>>>>>> difference I have noticed based on some tests with beeline.
>>>>>> 
>>>>>> Thanks
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Le mar. 19 nov. 2019 à 00:18, David Morin <morin.david....@gmail.com 
>>>>>> <mailto:morin.david....@gmail.com>> a écrit :
>>>>>> Hello,
>>>>>> 
>>>>>> I'm trying to understand the purpose of the rowid column inside ORC 
>>>>>> delta file
>>>>>> {"transactionid":11359,"bucketid":5,"rowid":0}
>>>>>> Orc view: 
>>>>>> {"operation":0,"originalTransaction":11359,"bucket":5,"rowId":0,"currentTransaction":11359,"row":...}
>>>>>> I use HDP 2.6 => Hive 2
>>>>>> 
>>>>>> If I want to be idempotent with INSERT / DELETE / INSERT. 
>>>>>> Do we have to keep the same rowid ?
>>>>>> It seems that when the rowid is changed during the second INSERT I have 
>>>>>> a duplicate row.
>>>>>> For me, I can create a new rowid for the new transaction during the 
>>>>>> second INSERT but that seems to generate duplicate records.
>>>>>> 
>>>>>> Regards,
>>>>>> David
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>> 
>>

Re: ORC: duplicate record - rowid meaning ?

Reply via email to