[ https://issues.apache.org/jira/browse/HIVE-14680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15492075#comment-15492075 ]
Siddharth Seth commented on HIVE-14680: --------------------------------------- Mostly looks good. Slots start at 0, correct? {code}// Since our probing method is totally bogus, give up after some time and return everything.{code} Think it'll be better to return nothing. That'll cause the scheduler to go random. Everything has a good chance of overwhelming a box - at least without grouping. With grouping, it may spread out. Would be good to display the number of expected instances on the LlapWebService (it shows the ones which are up and running only). Could probably show the ones which have gone down, with the isAlive set to false. Separate jira? [~gopalv] - any comments on the second level hash function, and how it moves between hosts with the (+1 * hash2) {code}startOffset >> 2{code} This is unrelated to this patch specifically, but related to the series of patches. This is trying to avoid differences in the start position, correct? Upto a difference of 4. Will it work for splits that don't start at 0? > retain consistent splits /during/ (as opposed to across) LLAP failures on top > of HIVE-14589 > ------------------------------------------------------------------------------------------- > > Key: HIVE-14680 > URL: https://issues.apache.org/jira/browse/HIVE-14680 > Project: Hive > Issue Type: Bug > Reporter: Sergey Shelukhin > Assignee: Sergey Shelukhin > Attachments: HIVE-14680.01.patch, HIVE-14680.patch > > > see HIVE-14589. > Basic idea (spent about 7 minutes thinking about this based on RB comment ;)) > is to return locations for all slots to HostAffinitySplitLocationProvider, > the missing slots being inactive locations (based solely on the last slot > actually present). For the splits mapped to these locations, fall back via > different hash functions, or some sort of probing. > This still doesn't handle all the cases, namely when the last slots are gone > (consistent hashing is supposed to be good for this?); however for that we'd > need more involved coordination between nodes or a central updater to > indicate the number of nodes -- This message was sent by Atlassian JIRA (v6.3.4#6332)