[jira] [Work logged] (HIVE-26035) Explore moving to directsql for ObjectStore::addPartitions

ASF GitHub Bot (Jira) Sat, 21 Jan 2023 08:33:05 -0800


     [ 
https://issues.apache.org/jira/browse/HIVE-26035?focusedWorklogId=840810&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-840810
 ]


ASF GitHub Bot logged work on HIVE-26035:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 21/Jan/23 16:32
            Start Date: 21/Jan/23 16:32
    Worklog Time Spent: 10m 
      Work Description: VenuReddy2103 commented on code in PR #3905:
URL: https://github.com/apache/hive/pull/3905#discussion_r1083308365


##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:
##########
@@ -515,6 +529,803 @@ public List<String> 
getMaterializedViewsForRewriting(String dbName) throws MetaE
     }
   }
 
+  private Long getDataStoreId(Class<?> modelClass) throws MetaException {
+    ExecutionContext ec = ((JDOPersistenceManager) pm).getExecutionContext();
+    AbstractClassMetaData cmd = 
ec.getMetaDataManager().getMetaDataForClass(modelClass, 
ec.getClassLoaderResolver());
+    if (cmd.getIdentityType() == IdentityType.DATASTORE) {
+      return (Long) ec.getStoreManager().getValueGenerationStrategyValue(ec, 
cmd, -1);
+    } else {
+      throw new MetaException("Identity type is not datastore.");
+    }
+  }
+
+  /**
+   * Interface to execute multiple row insert query in batch for direct SQL
+   */
+  interface BatchExecutionContext {
+    void execute(String batchQueryText, int batchRowCount, int 
batchParamCount) throws MetaException;
+  }
+
+  private void insertInBatch(String tableName, String columns, int 
columnCount, String rowFormat, int rowCount,
+      BatchExecutionContext bec) throws MetaException {
+    if (rowCount == 0 || columnCount == 0) {
+      return;
+    }
+    int maxParamsCount = maxParamsInInsert;
+    if (maxParamsCount < columnCount) {
+      LOG.error("Maximum number of parameters in the direct SQL batch insert 
query is less than the table: {}"
+          + " columns. Executing single row insert queries.", tableName);
+      maxParamsCount = columnCount;
+    }
+    int maxRowsInBatch = maxParamsCount / columnCount;
+    int maxBatches = rowCount / maxRowsInBatch;
+    int last = rowCount % maxRowsInBatch;
+    String query = "";
+    if (maxBatches > 0) {
+      query = dbType.getBatchInsertQuery(tableName, columns, rowFormat, 
maxRowsInBatch);
+    }
+    int batchParamCount = maxRowsInBatch * columnCount;
+    for (int batch = 0; batch < maxBatches; batch++) {
+      bec.execute(query, maxRowsInBatch, batchParamCount);
+    }
+    if (last != 0) {
+      query = dbType.getBatchInsertQuery(tableName, columns, rowFormat, last);
+      bec.execute(query, last, last * columnCount);
+    }
+  }
+
+  private void insertSerdeInBatch(Map<Long, MSerDeInfo> serdeIdToSerDeInfo) 
throws MetaException {
+    int rowCount = serdeIdToSerDeInfo.size();
+    String columns = 
"(\"SERDE_ID\",\"DESCRIPTION\",\"DESERIALIZER_CLASS\",\"NAME\",\"SERDE_TYPE\",\"SLIB\","
+        + "\"SERIALIZER_CLASS\")";
+    String row = "(?,?,?,?,?,?,?)";
+    int columnCount = 7;
+    BatchExecutionContext bec = new BatchExecutionContext() {

Review Comment:
   Actually Batchable.runBatched() expects the input in the form of list. It is 
being used when have input as list of partition names/ids or column names. But 
in this case, objects to insert are not available as list. So defined a new 
interface local to this new file scope.





Issue Time Tracking
-------------------

    Worklog Id:     (was: 840810)
    Time Spent: 3.5h  (was: 3h 20m)

> Explore moving to directsql for ObjectStore::addPartitions
> ----------------------------------------------------------
>
>                 Key: HIVE-26035
>                 URL: https://issues.apache.org/jira/browse/HIVE-26035
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Rajesh Balamohan
>            Assignee: Venugopal Reddy K
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Currently {{addPartitions}} uses datanuclues and is super slow for large 
> number of partitions. It will be good to move to direct sql. Lots of repeated 
> SQLs can be avoided as well (e.g SDS, SERDE, TABLE_PARAMS)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26035) Explore moving to directsql for ObjectStore::addPartitions

Reply via email to