[jira] [Commented] (HIVE-26437) dump unpartitioned managed table metadata in parallel

Amit Saonerkar (Jira) Tue, 01 Nov 2022 05:07:07 -0700


    [ 
https://issues.apache.org/jira/browse/HIVE-26437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627126#comment-17627126
 ]


Amit Saonerkar commented on HIVE-26437:
---------------------------------------

*+Jmh Performance Benchmark Test+* 
(itests/hive-jmh/src/main/java/org/apache/hive/benchmark/ql/exec/TableAndPartitionExportBench.java)

The test runs in two modes, one is parallel and other is serial. The parallel 
mode makes use of ExportService and creates 100 threads to execute a total of 
500 tasks. The serial mode does not use ExportService and runs in single 
threaded mode to execute 500 tasks. The comparison is made between both runs. 
The total 5 iterations of each mode is run by the test and time of run is 
measured in milliseconds. The average time required for 5 iterations is then 
output as a benchmark for the operations.





*Results:-*

Results of jmh performance benchmark test indicates an improvement in 
TableExport operation. If the table is dumped in parallel instead of serial , 
the operation completes much faster. Below is the result seen when 500 
tableexport operations are done both in serial and parallel manner.

Result 
"org.apache.hive.benchmark.ql.exec.TableAndPartitionExportBench.BaseBench.parallel":

N = 5

mean = 640.862 ?(99.9%) 113.354 ms/op

Result 
"org.apache.hive.benchmark.ql.exec.TableAndPartitionExportBench.BaseBench.serial":

N = 5

mean = 51697.322 ?(99.9%) 322.747 ms/op

*Benchmark* {*}{*}{*}{*}{*}{*}{*}{*}{*}{*}  *Mode      Cnt Score    Error  
Units*

*TableAndPartitionExportBench.BaseBench.parallel ss  5     640.862  ?  113.354 
ms/op*

*TableAndPartitionExportBench.BaseBench.serial     ss 5 51697.322   ?  322.747 
ms/op*

 

 

*+End-End Performance benchmark number+* 

 ** 

A database is created with 1k managed acid tables which are all unpartitioned 
tables.

The config parameter REPL_TABLE_DUMP_PARALLELISM value of 100 is set before the 
replication dump command is executed. It is found that replication dump takes 9 
sec to complete table metadata dump with new Export service. 

When export service is not used it is found that the same number of tables took 
around 27 sec to complete the entire dump process. Hence it is seen that there 
is 3x improvement in performance of replication dump command execution.

 
|*No. of Tables*|*REPL_TABLE_DUMP_PARALLELISM*|*Export Service used*|*Time 
taken for REPL DUMP*|
|1000|100|Yes|9 sec|
|1000|100|No|27 sec|

 

> dump unpartitioned managed table metadata in parallel
> -----------------------------------------------------
>
>                 Key: HIVE-26437
>                 URL: https://issues.apache.org/jira/browse/HIVE-26437
>             Project: Hive
>          Issue Type: Improvement
>          Components: Hive
>            Reporter: Amit Saonerkar
>            Assignee: Amit Saonerkar
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 2h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-26437) dump unpartitioned managed table metadata in parallel

Reply via email to