[ 
https://issues.apache.org/jira/browse/HIVE-10104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-10104:
----------------------------------
    Attachment: HIVE-10104.1.txt

Patch to order the original splits by size and name.
Location is based on a hash of the filename and start position.

[~hagleitn] - could you please take a quick look for sanity.

Will commit after I'm able to test it a bit on a cluster larger than 1 node.

> LLAP: Generate consistent splits and locations for the same split across jobs
> -----------------------------------------------------------------------------
>
>                 Key: HIVE-10104
>                 URL: https://issues.apache.org/jira/browse/HIVE-10104
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Siddharth Seth
>            Assignee: Siddharth Seth
>             Fix For: llap
>
>         Attachments: HIVE-10104.1.txt
>
>
> Locations for splits are currently randomized. Also, the order of splits is 
> random - depending on how threads end up generating the splits.
> Add an option to sort the splits, and generate repeatable locations - 
> assuming all other factors are the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to