[ https://issues.apache.org/jira/browse/HIVE-26035?focusedWorklogId=836612&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-836612 ]
ASF GitHub Bot logged work on HIVE-26035: ----------------------------------------- Author: ASF GitHub Bot Created on: 03/Jan/23 13:12 Start Date: 03/Jan/23 13:12 Worklog Time Spent: 10m Work Description: VenuReddy2103 opened a new pull request, #3905: URL: https://github.com/apache/hive/pull/3905 ### What changes were proposed in this pull request? Proposing to use direct SQL for add partitions. 1. Added a datanucleus property `datanucleus.query.sql.allowAll` to allow insert operation from JDO SQL(default value is true). 2. Added a property `metastore.direct.sql.max.parameters.in.insert` to control the maximum rows in multiple row insert query to database. 3. Implemented utility to get identity value from the DataNucleus value generator. 4. Implemented multiple row insert in batches for all the tables involved in add partitions. 5. Implemented fallback to JDO based in case of any failure in direct SQL. ### Why are the changes needed? When data is loaded/inserted to the dynamic partition table, many partitions are created based on the cardinality of the partition column(s). These partitions are updated to the hive metastore backing datastore using `pm.makePersistentAll()` datanucleus JDO API. Partition model class depends on(or holds reference to) several other model classes/structs such as Storage descriptor, Serdes, Serde params, Partition params, Partition key values etc. When a model class instance is persisted using JDO API, datanucleus internally does cascade persistence of all the other model instances it depends on recursively. During this process, it fires a lot of SQL queries to the backing datastore for the tables of the model classes/structs respectively. Many of the queries among them are redundant and takes longer time when partitions are more. [HIVE-26035](https://issues.apache.org/jira/browse/HIVE-26035) ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Tested the scenario manually. Ran all the unittest cases. Below is the performance measurement in the local system using MySQL database with metastore-benchmarks tool available at [https://github.com/akolb1/hclient/tree/master/metastore-benchmarks](https://github.com/akolb1/hclient/tree/master/metastore-benchmarks). <img width="616" alt="image" src="https://user-images.githubusercontent.com/35334869/210361898-6a18ed2b-5347-4001-bbea-fa0f2dddf5fc.png"> Issue Time Tracking ------------------- Worklog Id: (was: 836612) Remaining Estimate: 0h Time Spent: 10m > Explore moving to directsql for ObjectStore::addPartitions > ---------------------------------------------------------- > > Key: HIVE-26035 > URL: https://issues.apache.org/jira/browse/HIVE-26035 > Project: Hive > Issue Type: Bug > Reporter: Rajesh Balamohan > Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Currently {{addPartitions}} uses datanuclues and is super slow for large > number of partitions. It will be good to move to direct sql. Lots of repeated > SQLs can be avoided as well (e.g SDS, SERDE, TABLE_PARAMS) -- This message was sent by Atlassian Jira (v8.20.10#820010)