[ 
https://issues.apache.org/jira/browse/HIVE-23048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066714#comment-17066714
 ] 

Peter Vary commented on HIVE-23048:
-----------------------------------

Just to summarize what I have found:
 * NEXT_TXN_ID is used for multiple purposes
 ** Generate txnId - sequence is adequate substitution for this purpose
 ** Make sure that the openTxns, and commitTxn (for writes) commands are 
serialized - we need extra lock for this. In a follow-up Jira we should check 
if other places are depend on the serialization, or not. We might be able to 
create only shared locks for {{openTxns}}, and exclusive locks for 
{{commitTxn}} with writes. Needs further investigation.
 ** Also used for Compactor during the old data cleanup, and when deciding 
which compaction can proceed. These tasks usually require a consistent view of 
the related tables. We have to replace this part of the code with single query 
solutions to ensure the consistent view.
 ** NEXT_TXN_ID was used as a cache for the HWM. We have to replace it with 
extra queries
 * Autogeneration of the keys is:
 ** Working for
 *** Derby
 *** Postgres
 *** SQL Server (mssql)
 ** Questionable for
 *** Oracle 11 or earlier (Oracle 12c has Identity) - We have to use sequences 
to manually insert data 
 *** MySQL 5.x (MySQL 8.0 fixes this) - 

{quote}"In *MySQL 5.7* and earlier, the auto-increment counter is *stored only 
in main memory, not on disk*. To initialize an auto-increment counter after a 
server restart, InnoDB would execute the equivalent of the following statement 
on the first insert into a table containing an AUTO_INCREMENT column.
{quote}_SELECT MAX(ai_col) FROM table_name FOR UPDATE;
{quote}
In MySQL 8.0, this behavior is changed. The current maximum auto-increment 
counter value is written to the redo log each time it changes and is saved to 
an engine-private system table on each checkpoint. These changes make the 
current maximum auto-increment counter value persistent across server restarts."
{quote} * Retrieving the value for the generated key with batch inserts:
 ** Working for
 *** MySQL
 *** Postgres
 *** Oracle
 ** Not working for
 *** SQL Server
 *** Derby

Also the performance of inserts (table with 2 columns for testing) with 
generated keys is really dependent of the database (1000 items, 100 loops, 
local db instances, in ms):
{code:java}
                       Gen keys   Gen keys    Gen keys   Gen keys   No keys  No 
keys    No keys    Value   
                       nobatch    batched     long prep  long stat  batched  
long prep  long stat  retrieval
MySQL (5.7.23)          204        10          10         10         10        
9         12         0.42
PostgreSQL (12.2)        68        12           5          4          4        
3          4         0.57
Oracle (11.2.0.2.0)    2202       211         Error      Error      122      
438        425         2.37
MS SQL Server (14)     1988       Error       Error      Error       26       
41          8         2.33
Apache Derby (10.14.1) 2043       Error       Error      Error       41      
104        106         4.19
{code}
Current implementation uses long static query without retrieving keys ({{No 
keys long stat}} column)
 If we move forward with the autogenerated keys then we have to use the 
generated keys feature, or have to query the inserted data to be able to return 
the new identifiers again.

> Use sequences for TXN_ID generation
> -----------------------------------
>
>                 Key: HIVE-23048
>                 URL: https://issues.apache.org/jira/browse/HIVE-23048
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Peter Vary
>            Assignee: Peter Vary
>            Priority: Major
>         Attachments: DbTest.java
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to