[ https://issues.apache.org/jira/browse/HIVE-23048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066714#comment-17066714 ]
Peter Vary commented on HIVE-23048: ----------------------------------- Just to summarize what I have found: * NEXT_TXN_ID is used for multiple purposes ** Generate txnId - sequence is adequate substitution for this purpose ** Make sure that the openTxns, and commitTxn (for writes) commands are serialized - we need extra lock for this. In a follow-up Jira we should check if other places are depend on the serialization, or not. We might be able to create only shared locks for {{openTxns}}, and exclusive locks for {{commitTxn}} with writes. Needs further investigation. ** Also used for Compactor during the old data cleanup, and when deciding which compaction can proceed. These tasks usually require a consistent view of the related tables. We have to replace this part of the code with single query solutions to ensure the consistent view. ** NEXT_TXN_ID was used as a cache for the HWM. We have to replace it with extra queries * Autogeneration of the keys is: ** Working for *** Derby *** Postgres *** SQL Server (mssql) ** Questionable for *** Oracle 11 or earlier (Oracle 12c has Identity) - We have to use sequences to manually insert data *** MySQL 5.x (MySQL 8.0 fixes this) - {quote}"In *MySQL 5.7* and earlier, the auto-increment counter is *stored only in main memory, not on disk*. To initialize an auto-increment counter after a server restart, InnoDB would execute the equivalent of the following statement on the first insert into a table containing an AUTO_INCREMENT column. {quote}_SELECT MAX(ai_col) FROM table_name FOR UPDATE; {quote} In MySQL 8.0, this behavior is changed. The current maximum auto-increment counter value is written to the redo log each time it changes and is saved to an engine-private system table on each checkpoint. These changes make the current maximum auto-increment counter value persistent across server restarts." {quote} * Retrieving the value for the generated key with batch inserts: ** Working for *** MySQL *** Postgres *** Oracle ** Not working for *** SQL Server *** Derby Also the performance of inserts (table with 2 columns for testing) with generated keys is really dependent of the database (1000 items, 100 loops, local db instances, in ms): {code:java} Gen keys Gen keys Gen keys Gen keys No keys No keys No keys Value nobatch batched long prep long stat batched long prep long stat retrieval MySQL (5.7.23) 204 10 10 10 10 9 12 0.42 PostgreSQL (12.2) 68 12 5 4 4 3 4 0.57 Oracle (11.2.0.2.0) 2202 211 Error Error 122 438 425 2.37 MS SQL Server (14) 1988 Error Error Error 26 41 8 2.33 Apache Derby (10.14.1) 2043 Error Error Error 41 104 106 4.19 {code} Current implementation uses long static query without retrieving keys ({{No keys long stat}} column) If we move forward with the autogenerated keys then we have to use the generated keys feature, or have to query the inserted data to be able to return the new identifiers again. > Use sequences for TXN_ID generation > ----------------------------------- > > Key: HIVE-23048 > URL: https://issues.apache.org/jira/browse/HIVE-23048 > Project: Hive > Issue Type: Bug > Reporter: Peter Vary > Assignee: Peter Vary > Priority: Major > Attachments: DbTest.java > > -- This message was sent by Atlassian Jira (v8.3.4#803005)