[ 
https://issues.apache.org/jira/browse/HIVE-26035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17683928#comment-17683928
 ] 

Venugopal Reddy K commented on HIVE-26035:
------------------------------------------

*Without direct sql(5 concurrent threads each creating 100 partitions):*
{noformat}
kvenureddy@192 hclient % java -jar 
./metastore-benchmarks/target/hmsbench-jar-with-dependencies.jar -H localhost 
--savedata /tmp/benchdata --sanitize -N 100 -o 
0302bench_results_http_modified-1.csv -C -d testbench_http --params=100  -E 
'drop.*' -E 'renameTable.*' -E 'getTableObjectsByName.*' -E 'listTables.*' -E 
'listPartitions.*' -E 'getPartitions.*' -E 'getPartitionsByNames.*' -E 
'getPartitionNames.*' -E 'listPartition' -E 'getPartition'  -E 'getNid' -E 
'listDatabases' -E 'getTable' -E 'createTable'  -T 5
Operation                      Mean     Med      Min      Max      Err%    
addPartition                   62.47    55.34    32.66    172.1    37.15   
addPartitions.100              191.8    182.3    167.0    292.5    12.68   
concurrentPartitionAdd#5.100   1476     1464     1351     2162     6.815 
{noformat}
 

*With direct sql(5 concurrent threads each creating 100 partitions):*
{noformat}
kvenureddy@192 hclient % java -jar 
./metastore-benchmarks/target/hmsbench-jar-with-dependencies.jar -H localhost 
--savedata /tmp/benchdata --sanitize -N 100 -o 
0302bench_results_http_modified.csv -C -d testbench_http --params=100  -E 
'drop.*' -E 'renameTable.*' -E 'getTableObjectsByName.*' -E 'listTables.*' -E 
'listPartitions.*' -E 'getPartitions.*' -E 'getPartitionsByNames.*' -E 
'getPartitionNames.*' -E 'listPartition' -E 'getPartition'  -E 'getNid' -E 
'listDatabases' -E 'getTable' -E 'createTable'  -T 5 
Operation                      Mean     Med      Min      Max      Err%    
addPartition                   66.33    59.16    36.85    176.2    40.69   
addPartitions.100              81.58    74.11    59.33    240.6    31.03   
concurrentPartitionAdd#5.100   410.4    391.4    345.8    1063     
19.04{noformat}
 

Add 100 partitions and 1000 partitions in milliseconds.

{*}Base version({*}{*}Without direct SQL){*}{*}:{*}
|*Operation*|*Mean*|*Med*|*Min*|*Max*|*Err%*|
|*addPartitions.100*|*189.552*|*176.996*|*149.402*|*314.393*|*18.2392*|
|*addPartitions.1000*|*1641.48*|*1624.37*|*1577.92*|*1847.76*|*3.07802*|
|*concurrentPartitionAdd#2.100*|*390.799*|*377.246*|*352.988*|*544.446*|*8.13874*|
|*concurrentPartitionAdd#2.1000*|*3441.06*|*3419.13*|*3333.46*|*3931.22*|*2.50776*|

 

{*}After modification({*}{*}With direct SQL){*}{*}:{*}
||*Operation*||*Mean*||*Med*||*Min*||*Max*||*Err%*||
||*addPartitions.100*||*83.0217*||*72.2195*||*58.8024*||*214.897*||*33.4667*||
||*addPartitions.1000*||*506.649*||*496.345*||*473.402*||*687.063*||*6.23298*||
||*concurrentPartitionAdd#2.100*||*178.152*||*168.228*||*150.619*||*304.203*||*14.7953*||
|*concurrentPartitionAdd#2.1000*|*1144.33*|*1132.06*|*1092.98*|*1456.85*|*4.02413*|

 

 

> Explore moving to directsql for ObjectStore::addPartitions
> ----------------------------------------------------------
>
>                 Key: HIVE-26035
>                 URL: https://issues.apache.org/jira/browse/HIVE-26035
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Rajesh Balamohan
>            Assignee: Venugopal Reddy K
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>          Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> Currently {{addPartitions}} uses datanuclues and is super slow for large 
> number of partitions. It will be good to move to direct sql. Lots of repeated 
> SQLs can be avoided as well (e.g SDS, SERDE, TABLE_PARAMS)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to