[ https://issues.apache.org/jira/browse/HIVE-10104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Siddharth Seth updated HIVE-10104: ---------------------------------- Attachment: HIVE-10104.1.txt Patch to order the original splits by size and name. Location is based on a hash of the filename and start position. [~hagleitn] - could you please take a quick look for sanity. Will commit after I'm able to test it a bit on a cluster larger than 1 node. > LLAP: Generate consistent splits and locations for the same split across jobs > ----------------------------------------------------------------------------- > > Key: HIVE-10104 > URL: https://issues.apache.org/jira/browse/HIVE-10104 > Project: Hive > Issue Type: Sub-task > Reporter: Siddharth Seth > Assignee: Siddharth Seth > Fix For: llap > > Attachments: HIVE-10104.1.txt > > > Locations for splits are currently randomized. Also, the order of splits is > random - depending on how threads end up generating the splits. > Add an option to sort the splits, and generate repeatable locations - > assuming all other factors are the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)