[ https://issues.apache.org/jira/browse/HIVE-21214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Deepak Jaiswal updated HIVE-21214: ---------------------------------- Attachment: HIVE-21214.3.patch > MoveTask : Use attemptId instead of file size for deduplication of files > compareTempOrDuplicateFiles() > ------------------------------------------------------------------------------------------------------ > > Key: HIVE-21214 > URL: https://issues.apache.org/jira/browse/HIVE-21214 > Project: Hive > Issue Type: Bug > Reporter: Deepak Jaiswal > Assignee: Deepak Jaiswal > Priority: Major > Attachments: HIVE-21214.1.patch, HIVE-21214.2.patch, > HIVE-21214.3.patch > > > For a given task, if there is more than one attempt then deduplication logic > kicks in. > {noformat} > Utilities.compareTempOrDuplicateFiles(){noformat} > The logic uses file size and picks the one with largest size. This logic is > very fragile. > ideally, it should pick the successful attempt's file. > However, a simpler solution is to pick the newest attempt and also checking > the file size for the newest attempt is the largest. > If not, throw an exception. > > cc [~gopalv] [~thejas] [~jdere] [~ekoifman] -- This message was sent by Atlassian JIRA (v7.6.3#76005)