-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71707/
-----------------------------------------------------------
(Updated Nov. 5, 2019, 3:32 p.m.)
Review request for hive, Ashutosh Chauhan, Peter Vary, and Slim Bouguerra.
Changes
-------
Adressing Ashutosh's comments
Bugs: HIVE-22411
https://issues.apache.org/jira/browse/HIVE-22411
Repository: hive-git
Description
-------
Executing single insert statements on a transactional table effects write
performance on a s3 file system. Each insert creates a new delta directory.
After each insert hive calculates statistics like number of file in the table
and total size of the table. In order to calculate these, it traverses the
directory recursively. During the recursion for each path a separate listStatus
call is executed. In the end the more delta directory you have the more time it
takes to calculate the statistics.
Therefore insertion time goes up linearly.
Diffs (updated)
-----
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/Warehouse.java
38e843aeacf
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/FileUtils.java
bf206fffc26
Diff: https://reviews.apache.org/r/71707/diff/2/
Changes: https://reviews.apache.org/r/71707/diff/1-2/
Testing
-------
measured and plotted insertation time
Thanks,
Attila Magyar