[ 
https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14234769#comment-14234769
 ] 

Jihong Liu commented on HIVE-8966:
----------------------------------

I think we may have to withdraw this patch for now. It looks like currently 
hive must not support doing compaction and loading in the same time for a 
partition. 
Without this patch, if loading for a partition is not completely finished, 
compaction will always fail, so nothing happen. After apply this patch, 
compaction will go through and finish. However we may loss data! I did a test. 
Data could be lost if we do compaction meanwhile the loading is not finished 
yet. 
But if keep the current version, it must be a limitation for hive. If streaming 
load to a partition for a long period, performance will be affected if cannot 
do compaction on it. 

For completely solve this issue, my initial thinking is that the delta files 
with open transaction should not be compacted. Currently they must be inlcuded, 
and it is probably the reason for data lost. But other closed delta files 
should be able to compact. So we can do compaction and loading in the same time.


> Delta files created by hive hcatalog streaming cannot be compacted
> ------------------------------------------------------------------
>
>                 Key: HIVE-8966
>                 URL: https://issues.apache.org/jira/browse/HIVE-8966
>             Project: Hive
>          Issue Type: Bug
>          Components: HCatalog
>    Affects Versions: 0.14.0
>         Environment: hive
>            Reporter: Jihong Liu
>            Assignee: Alan Gates
>            Priority: Critical
>             Fix For: 0.14.1
>
>         Attachments: HIVE-8966.patch
>
>
> hive hcatalog streaming will also create a file like bucket_n_flush_length in 
> each delta directory. Where "n" is the bucket number. But the 
> compactor.CompactorMR think this file also needs to compact. However this 
> file of course cannot be compacted, so compactor.CompactorMR will not 
> continue to do the compaction. 
> Did a test, after removed the bucket_n_flush_length file, then the "alter 
> table partition compact" finished successfully. If don't delete that file, 
> nothing will be compacted. 
> This is probably a very severity bug. Both 0.13 and 0.14 have this issue



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to