[ https://issues.apache.org/jira/browse/HIVE-26035?focusedWorklogId=840810&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-840810 ]
ASF GitHub Bot logged work on HIVE-26035: ----------------------------------------- Author: ASF GitHub Bot Created on: 21/Jan/23 16:32 Start Date: 21/Jan/23 16:32 Worklog Time Spent: 10m Work Description: VenuReddy2103 commented on code in PR #3905: URL: https://github.com/apache/hive/pull/3905#discussion_r1083308365 ########## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java: ########## @@ -515,6 +529,803 @@ public List<String> getMaterializedViewsForRewriting(String dbName) throws MetaE } } + private Long getDataStoreId(Class<?> modelClass) throws MetaException { + ExecutionContext ec = ((JDOPersistenceManager) pm).getExecutionContext(); + AbstractClassMetaData cmd = ec.getMetaDataManager().getMetaDataForClass(modelClass, ec.getClassLoaderResolver()); + if (cmd.getIdentityType() == IdentityType.DATASTORE) { + return (Long) ec.getStoreManager().getValueGenerationStrategyValue(ec, cmd, -1); + } else { + throw new MetaException("Identity type is not datastore."); + } + } + + /** + * Interface to execute multiple row insert query in batch for direct SQL + */ + interface BatchExecutionContext { + void execute(String batchQueryText, int batchRowCount, int batchParamCount) throws MetaException; + } + + private void insertInBatch(String tableName, String columns, int columnCount, String rowFormat, int rowCount, + BatchExecutionContext bec) throws MetaException { + if (rowCount == 0 || columnCount == 0) { + return; + } + int maxParamsCount = maxParamsInInsert; + if (maxParamsCount < columnCount) { + LOG.error("Maximum number of parameters in the direct SQL batch insert query is less than the table: {}" + + " columns. Executing single row insert queries.", tableName); + maxParamsCount = columnCount; + } + int maxRowsInBatch = maxParamsCount / columnCount; + int maxBatches = rowCount / maxRowsInBatch; + int last = rowCount % maxRowsInBatch; + String query = ""; + if (maxBatches > 0) { + query = dbType.getBatchInsertQuery(tableName, columns, rowFormat, maxRowsInBatch); + } + int batchParamCount = maxRowsInBatch * columnCount; + for (int batch = 0; batch < maxBatches; batch++) { + bec.execute(query, maxRowsInBatch, batchParamCount); + } + if (last != 0) { + query = dbType.getBatchInsertQuery(tableName, columns, rowFormat, last); + bec.execute(query, last, last * columnCount); + } + } + + private void insertSerdeInBatch(Map<Long, MSerDeInfo> serdeIdToSerDeInfo) throws MetaException { + int rowCount = serdeIdToSerDeInfo.size(); + String columns = "(\"SERDE_ID\",\"DESCRIPTION\",\"DESERIALIZER_CLASS\",\"NAME\",\"SERDE_TYPE\",\"SLIB\"," + + "\"SERIALIZER_CLASS\")"; + String row = "(?,?,?,?,?,?,?)"; + int columnCount = 7; + BatchExecutionContext bec = new BatchExecutionContext() { Review Comment: Actually Batchable.runBatched() expects the input in the form of list. It is being used when have input as list of partition names/ids or column names. But in this case, objects to insert are not available as list. So defined a new interface local to this new file scope. Issue Time Tracking ------------------- Worklog Id: (was: 840810) Time Spent: 3.5h (was: 3h 20m) > Explore moving to directsql for ObjectStore::addPartitions > ---------------------------------------------------------- > > Key: HIVE-26035 > URL: https://issues.apache.org/jira/browse/HIVE-26035 > Project: Hive > Issue Type: Bug > Reporter: Rajesh Balamohan > Assignee: Venugopal Reddy K > Priority: Major > Labels: pull-request-available > Time Spent: 3.5h > Remaining Estimate: 0h > > Currently {{addPartitions}} uses datanuclues and is super slow for large > number of partitions. It will be good to move to direct sql. Lots of repeated > SQLs can be avoided as well (e.g SDS, SERDE, TABLE_PARAMS) -- This message was sent by Atlassian Jira (v8.20.10#820010)