[ 
https://issues.apache.org/jira/browse/HIVE-22938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod updated HIVE-22938:
------------------------------
    Description: 
As a follow-up to HIVE-22918, this ticket is to investigate whether the empty 
bucket file creation mechanism can be removed safely when using MR as the 
engine. 

For a bucketed table of N buckets, each insert will generate N bucket files in 
the delta directory, regardless of how many actual buckets are written to. As 
an example, if a table has 500 buckets, and we insert a single record, 499 
empty bucket files are generated alongside the single bucket that contains the 
actual data. This makes the operation substantially slower in some cases. This 
behaviour only seems to happen when using MR as the execution engine.

Some components/parts of the code might depend on this behaviour though, so it 
needs to be verified that removing this logic does not interfere with anything.

  was:
As a follow-up to 
[HIVE-22918|https://issues.apache.org/jira/browse/HIVE-22918], this ticket is 
to investigate whether the empty bucket file creation mechanism can be removed 
safely from MR. 

For a bucketed table of N buckets, each insert will generate N bucket files in 
the delta directory, regardless of how many actual buckets are written to. As 
an example, if a table has 500 buckets, and we insert a single record, 499 
empty bucket files are generated alongside the single bucket that contains the 
actual data. This makes the operation substantially slower in some cases. This 
behaviour only seems to happen when using MR as the execution engine.

Some components/parts of the code might depend on this behaviour though, so it 
needs to be verified that removing this logic does not interfere with anything.


> Investigate possibility of removing empty bucket file creation mechanism in 
> Hive-on-MR
> --------------------------------------------------------------------------------------
>
>                 Key: HIVE-22938
>                 URL: https://issues.apache.org/jira/browse/HIVE-22938
>             Project: Hive
>          Issue Type: Task
>            Reporter: Marton Bod
>            Priority: Major
>
> As a follow-up to HIVE-22918, this ticket is to investigate whether the empty 
> bucket file creation mechanism can be removed safely when using MR as the 
> engine. 
> For a bucketed table of N buckets, each insert will generate N bucket files 
> in the delta directory, regardless of how many actual buckets are written to. 
> As an example, if a table has 500 buckets, and we insert a single record, 499 
> empty bucket files are generated alongside the single bucket that contains 
> the actual data. This makes the operation substantially slower in some cases. 
> This behaviour only seems to happen when using MR as the execution engine.
> Some components/parts of the code might depend on this behaviour though, so 
> it needs to be verified that removing this logic does not interfere with 
> anything.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to