[ https://issues.apache.org/jira/browse/HIVE-26035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17683928#comment-17683928 ]
Venugopal Reddy K commented on HIVE-26035: ------------------------------------------ *Without direct sql(5 concurrent threads each creating 100 partitions):* {noformat} kvenureddy@192 hclient % java -jar ./metastore-benchmarks/target/hmsbench-jar-with-dependencies.jar -H localhost --savedata /tmp/benchdata --sanitize -N 100 -o 0302bench_results_http_modified-1.csv -C -d testbench_http --params=100 -E 'drop.*' -E 'renameTable.*' -E 'getTableObjectsByName.*' -E 'listTables.*' -E 'listPartitions.*' -E 'getPartitions.*' -E 'getPartitionsByNames.*' -E 'getPartitionNames.*' -E 'listPartition' -E 'getPartition' -E 'getNid' -E 'listDatabases' -E 'getTable' -E 'createTable' -T 5 Operation Mean Med Min Max Err% addPartition 62.47 55.34 32.66 172.1 37.15 addPartitions.100 191.8 182.3 167.0 292.5 12.68 concurrentPartitionAdd#5.100 1476 1464 1351 2162 6.815 {noformat} *With direct sql(5 concurrent threads each creating 100 partitions):* {noformat} kvenureddy@192 hclient % java -jar ./metastore-benchmarks/target/hmsbench-jar-with-dependencies.jar -H localhost --savedata /tmp/benchdata --sanitize -N 100 -o 0302bench_results_http_modified.csv -C -d testbench_http --params=100 -E 'drop.*' -E 'renameTable.*' -E 'getTableObjectsByName.*' -E 'listTables.*' -E 'listPartitions.*' -E 'getPartitions.*' -E 'getPartitionsByNames.*' -E 'getPartitionNames.*' -E 'listPartition' -E 'getPartition' -E 'getNid' -E 'listDatabases' -E 'getTable' -E 'createTable' -T 5 Operation Mean Med Min Max Err% addPartition 66.33 59.16 36.85 176.2 40.69 addPartitions.100 81.58 74.11 59.33 240.6 31.03 concurrentPartitionAdd#5.100 410.4 391.4 345.8 1063 19.04{noformat} Add 100 partitions and 1000 partitions in milliseconds. {*}Base version({*}{*}Without direct SQL){*}{*}:{*} |*Operation*|*Mean*|*Med*|*Min*|*Max*|*Err%*| |*addPartitions.100*|*189.552*|*176.996*|*149.402*|*314.393*|*18.2392*| |*addPartitions.1000*|*1641.48*|*1624.37*|*1577.92*|*1847.76*|*3.07802*| |*concurrentPartitionAdd#2.100*|*390.799*|*377.246*|*352.988*|*544.446*|*8.13874*| |*concurrentPartitionAdd#2.1000*|*3441.06*|*3419.13*|*3333.46*|*3931.22*|*2.50776*| {*}After modification({*}{*}With direct SQL){*}{*}:{*} ||*Operation*||*Mean*||*Med*||*Min*||*Max*||*Err%*|| ||*addPartitions.100*||*83.0217*||*72.2195*||*58.8024*||*214.897*||*33.4667*|| ||*addPartitions.1000*||*506.649*||*496.345*||*473.402*||*687.063*||*6.23298*|| ||*concurrentPartitionAdd#2.100*||*178.152*||*168.228*||*150.619*||*304.203*||*14.7953*|| |*concurrentPartitionAdd#2.1000*|*1144.33*|*1132.06*|*1092.98*|*1456.85*|*4.02413*| > Explore moving to directsql for ObjectStore::addPartitions > ---------------------------------------------------------- > > Key: HIVE-26035 > URL: https://issues.apache.org/jira/browse/HIVE-26035 > Project: Hive > Issue Type: Bug > Reporter: Rajesh Balamohan > Assignee: Venugopal Reddy K > Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 8.5h > Remaining Estimate: 0h > > Currently {{addPartitions}} uses datanuclues and is super slow for large > number of partitions. It will be good to move to direct sql. Lots of repeated > SQLs can be avoided as well (e.g SDS, SERDE, TABLE_PARAMS) -- This message was sent by Atlassian Jira (v8.20.10#820010)