Re: ORC: duplicate record - rowid meaning ?

Peter Vary Wed, 05 Feb 2020 04:23:28 -0800

Hi David,

There is no tombstone for the updated record.
In ACID v2 there is no update for the rows. Only insert and delete. So update 
is handled as delete (old) row, insert (new/independent) row.
The delete is stored in the delete delta directories., and the file do not have 
to contain the {row} struct at the end.


Hope this helps,
Peter

> On Feb 5, 2020, at 09:39, David Morin <morin.david....@gmail.com> wrote:
> 
> Hi,
> 
> It works pretty well but... still problems sometimes occur
> Do we have to separate operations ?
> 
> Here after Orc files content:
> hive --orcfiledump hdfs://XXXX/delta_0198994_0198994_0000/bucket_00000 
> {"operation":0,"originalTransaction":198994,"bucket":0,"rowId":14,"currentTransaction":198994,"row":{...}}
> 
> hive --orcfiledump hdfs://XXXX/delta_0199073_0199073_0000/bucket_00000 
> {"operation":1,"originalTransaction":198994,"bucket":0,"rowId":14,"currentTransaction":199073,"row":{...}}
> 
> => When I execute a SELECT statement on the PK I see 2 records. The link is 
> not done.
> Still, the meta looks good.
> The only difference I see is that in Orc files I mix operations.
> When I perform a MERGE statement for example 2 directories have been created 
> hdfs://XXXX/delta_0199073_0199073_0000
> hdfs://XXXX/delta_0199073_0199073_0002
> And the first one contains updates (operation:1) and the second one, inserts 
> (operation:0)
> 
> Thanks for your help
> David
> 
> On 2019/12/01 16:57:08, David Morin <morin.david....@gmail.com> wrote: 
>> Hi Peter,
>> 
>> At the moment I have a pipeline based on Flink to write Orc Files. These Orc 
>> Files can be read from Hive thanks to external tables and, then, a merge 
>> statement (triggered by oozie) push these data into tables managed by Hive 
>> (transactional tables => ORC). Hive version is 2.1 because this is the one 
>> provided by HDP 2.6.5.
>> We've developed a system that write Hive Delta Files for the managed tables 
>> directly from Flink.
>> The current streaming apis for Hive 2 are not suitable for our needs and we 
>> cannot use the new Hive 3 streaming api yet. This system uses the Flink 
>> state to store Hive metadata (originalTransaction, bucket, rowId, ..)
>> Thanks for your reply because yes, when files are ordered by 
>> originalTransacion, bucket, rowId
>> it works ! I just have to use 1 transaction instead of 2 at the moment and 
>> it will be ok.
>> 
>> Thanks
>> David
>> 
>> On 2019/11/29 11:18:05, Peter Vary <pv...@cloudera.com> wrote: 
>>> Hi David,
>>> 
>>> Not entirely sure what you are doing here :), my guess is that you are 
>>> trying to write ACID tables outside of hive. Am I right? What is the exact 
>>> use-case? There might be better solutions out there than writing the files 
>>> by hand.
>>> 
>>> As for your question below: Yes, the files should be ordered by: 
>>> originalTransacion, bucket, rowId triple, otherwise you will get wrong 
>>> results.
>>> 
>>> Thanks,
>>> Peter
>>> 
>>>> On Nov 19, 2019, at 13:30, David Morin <morin.david....@gmail.com> wrote:
>>>> 
>>>> here after more details about ORC content and the fact we have duplicate 
>>>> rows:
>>>> 
>>>> /delta_0011365_0011365_0000/bucket_00003
>>>> 
>>>> {"operation":0,"originalTransaction":11365,"bucket":3,"rowId":0,"currentTransaction":11365,"row":{"TS":1574156027915254212,"cle":5218,...}}
>>>> {"operation":0,"originalTransaction":11365,"bucket":3,"rowId":1,"currentTransaction":11365,"row":{"TS":1574156027915075038,"cle":5216,...}}
>>>> 
>>>> 
>>>> /delta_0011368_0011368_0000/bucket_00003
>>>> 
>>>> {"operation":2,"originalTransaction":11365,"bucket":3,"rowId":1,"currentTransaction":11368,"row":null}
>>>> {"operation":2,"originalTransaction":11365,"bucket":3,"rowId":0,"currentTransaction":11368,"row":null}
>>>> 
>>>> /delta_0011369_0011369_0000/bucket_00003
>>>> 
>>>> {"operation":0,"originalTransaction":11369,"bucket":3,"rowId":1,"currentTransaction":11369,"row":{"TS":1574157407855174144,"cle":5216,...}}
>>>> {"operation":0,"originalTransaction":11369,"bucket":3,"rowId":0,"currentTransaction":11369,"row":{"TS":1574157407855265906,"cle":5218,...}}
>>>> 
>>>> +-------------------------------------------------+-------+--+
>>>> |                     row__id                     |  cle  |
>>>> +-------------------------------------------------+-------+--+
>>>> | {"transactionid":11367,"bucketid":0,"rowid":0}  | 5209  |
>>>> | {"transactionid":11369,"bucketid":0,"rowid":0}  | 5211  |
>>>> | {"transactionid":11369,"bucketid":1,"rowid":0}  | 5210  |
>>>> | {"transactionid":11369,"bucketid":2,"rowid":0}  | 5214  |
>>>> | {"transactionid":11369,"bucketid":2,"rowid":1}  | 5215  |
>>>> | {"transactionid":11365,"bucketid":3,"rowid":0}  | 5218  |
>>>> | {"transactionid":11365,"bucketid":3,"rowid":1}  | 5216  |
>>>> | {"transactionid":11369,"bucketid":3,"rowid":1}  | 5216  |
>>>> | {"transactionid":11369,"bucketid":3,"rowid":0}  | 5218  |
>>>> | {"transactionid":11369,"bucketid":4,"rowid":0}  | 5217  |
>>>> | {"transactionid":11369,"bucketid":4,"rowid":1}  | 5213  |
>>>> | {"transactionid":11369,"bucketid":7,"rowid":0}  | 5212  |
>>>> +-------------------------------------------------+-------+--+
>>>> 
>>>> As you can see we have duplicate rows for column "cle" 5216 and 5218
>>>> Do we have to keep the rowids ordered ? because this is the only 
>>>> difference I have noticed based on some tests with beeline.
>>>> 
>>>> Thanks
>>>> 
>>>> 
>>>> 
>>>> Le mar. 19 nov. 2019 à 00:18, David Morin <morin.david....@gmail.com 
>>>> <mailto:morin.david....@gmail.com>> a écrit :
>>>> Hello,
>>>> 
>>>> I'm trying to understand the purpose of the rowid column inside ORC delta 
>>>> file
>>>> {"transactionid":11359,"bucketid":5,"rowid":0}
>>>> Orc view: 
>>>> {"operation":0,"originalTransaction":11359,"bucket":5,"rowId":0,"currentTransaction":11359,"row":...}
>>>> I use HDP 2.6 => Hive 2
>>>> 
>>>> If I want to be idempotent with INSERT / DELETE / INSERT. 
>>>> Do we have to keep the same rowid ?
>>>> It seems that when the rowid is changed during the second INSERT I have a 
>>>> duplicate row.
>>>> For me, I can create a new rowid for the new transaction during the second 
>>>> INSERT but that seems to generate duplicate records.
>>>> 
>>>> Regards,
>>>> David
>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>

Re: ORC: duplicate record - rowid meaning ?

Reply via email to