[ https://issues.apache.org/jira/browse/HIVE-26815?focusedWorklogId=831677&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-831677 ]
ASF GitHub Bot logged work on HIVE-26815: ----------------------------------------- Author: ASF GitHub Bot Created on: 07/Dec/22 08:13 Start Date: 07/Dec/22 08:13 Worklog Time Spent: 10m Work Description: yigress opened a new pull request, #3840: URL: https://github.com/apache/hive/pull/3840 ### What changes were proposed in this pull request? 1. add a hive configuration hive.use.scratchdir.for.staging 2. for native table, no-mm, no-direct-insert, no-acid, change dynamic partition staging directory layout from <dest_path>/<static_partition>/<staging_dir>/<dynamic_partition> to <dest_path>/<staging_dir>/<static_partition>/<dynamic_partition> 3. when hive.use.scratchdir.for.staging=true, FileSinkOperator's dirName, DynamicContext's sourcePath change from <dest_path>/{hive.exec.stagingdir} to <hive.exec.scratchdir> for example for query insert into/overwrite table partition(year=2001, season) select... before the change, the FileSinkOperator conf has <table_path>/year=2001/.staging_dir/season=xxx after the change, it has <table_path>/.staging_dir/year=2001/season=xxx This change allow to swap <table_path> with another path such as <hive.exec.scratchdir>, and the moveTask will move into <table_path> ### Why are the changes needed? In the S3 blobstorage optimization, HIVE-15121 / HIVE-17620 changed interim job path to use hive.exec.scracthdir, final job to use hive.exec.stagingdir. https://issues.apache.org/jira/browse/HIVE-15215 is open whether to use scratch for staging dir for S3. However for blobstorage where 'rename' is slow and no encryption, it can help performance to use scratchdir to staging query results and use the MoveTask to copy to blobstorage. This is especially true when there is FileMerge task. This may also help cross-filesystem when user wants to use local cluster filesystem to staging query results and move the results to destination filesystem. ### Does this PR introduce _any_ user-facing change? This adds a new hive configuration hive.use.scratchdir.for.staging, default false ### How was this patch tested? Tested with patch on hive 3.1.2 Issue Time Tracking ------------------- Worklog Id: (was: 831677) Remaining Estimate: 0h Time Spent: 10m > Backport HIVE-26758 (Allow use scratchdir for staging final job) > ---------------------------------------------------------------- > > Key: HIVE-26815 > URL: https://issues.apache.org/jira/browse/HIVE-26815 > Project: Hive > Issue Type: Improvement > Components: Hive > Affects Versions: 3.2.0 > Reporter: Yi Zhang > Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > HIVE-26758 add an option to allow choose set final job staging with > hive.exec.scratchdir. This is to backport this into 3.2.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)