[jira] [Work logged] (HIVE-25521) Data corruption when concatenating files with different compressions in same table/partition

ASF GitHub Bot (Jira) Wed, 06 Oct 2021 11:38:05 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-25521?focusedWorklogId=661123&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-661123
 ]


ASF GitHub Bot logged work on HIVE-25521:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 06/Oct/21 18:37
            Start Date: 06/Oct/21 18:37
    Worklog Time Spent: 10m 
      Work Description: harishjp commented on a change in pull request #2639:
URL: https://github.com/apache/hive/pull/2639#discussion_r723577374



##########
File path: 
ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFileStripeMergeRecordReader.java
##########
@@ -34,7 +35,7 @@
 
 public class TestOrcFileStripeMergeRecordReader {
 
-  private final int DEFAULT_STRIPE_SIZE = 5000;
+  private static final int DEFAULT_STRIPE_SIZE = 5000;

Review comment:
       I think this is just a test stripe size, creating 64k rows in test is 
too much when it does not provide any extra benefits. I think some arbitrary 
default was chosen. The name is definitely misleading.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 661123)
    Time Spent: 50m  (was: 40m)

> Data corruption when concatenating files with different compressions in same 
> table/partition
> --------------------------------------------------------------------------------------------
>
>                 Key: HIVE-25521
>                 URL: https://issues.apache.org/jira/browse/HIVE-25521
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Harish JP
>            Assignee: Harish JP
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently if files of different compressions are in same directory then 
> concatenate can fail and cause data corruption. This happens because file can 
> be moved by one task as incompatible file and the other tasks will fail after 
> this.
>  
> This issue is addressed in this Jira by only processing a file in one task 
> where offset 0 is process and ignoring the the file in all other tasks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25521) Data corruption when concatenating files with different compressions in same table/partition

Reply via email to