> On Aug. 30, 2016, 10:35 p.m., Siddharth Seth wrote: > > llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapZookeeperRegistryImpl.java, > > line 554 > > <https://reviews.apache.org/r/51312/diff/1/?file=1481607#file1481607line554> > > > > The size of this collection is used to determine the #knownWorkers in > > HostAffinitySplitLocationProvider. The size at the moment represents active > > nodes. > > > > I don't think the intent is to return a fewer number of nodes than what > > existed if a node goes down? > > 1) Return n entries, and one entry would indicate it's not active (and > > this will need to be acted upon by all clients > > 2) Add a getNumInstances itnerface, which can differ from actual nodes > > available, along with a getNodeAtLocation interface? > > Sergey Shelukhin wrote: > HIVE-14680 > > Siddharth Seth wrote: > Think this is very incomplete without this change. I don't think it's any > worse than it is today though. > > Sergey Shelukhin wrote: > Why? It handles consistency in most scenarios as described. There's just > a small window during the crash where splits will not get good locations.
The 'small window' is assuming the node comes back up in the fisrt place. Minutes is enough to cause damage to the cache on multiple nodes. What happens on a cluster where a physical node is taken out for maintenance, and there's no capacity to launch another node. What happens in case of an hour long maintenence window for a node/set of nodes. Will the consisten caching part take care of this ? (The target bucket size will be decreasing). The bucket size could be based on the number of instances requested by the slider app. > On Aug. 30, 2016, 10:35 p.m., Siddharth Seth wrote: > > llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapZookeeperRegistryImpl.java, > > line 561 > > <https://reviews.apache.org/r/51312/diff/1/?file=1481607#file1481607line561> > > > > Here, as well as other places, a node is only available when it's slot > > has been registered. Otherwise it should not be visible to clients. > > Sergey Shelukhin wrote: > I think it only makes sense for ordered, to not mess up the order. > Changed that. It's also problematic because we'd have to keep track of > coordinated notifications. > > Siddharth Seth wrote: > Assuming notifications for registered workers behave in the following > manner. > > NewNode - only when the slot has been registered. > Node down - when either the slot or the actual node goes away. > State changed - I'm not sure anyone is expected to act on this (what are > the possible notifications here). > > Sergey Shelukhin wrote: > Why? It doesn't seem to provide a lot of benefit compared to existing > worker node handling, in presence of crashes/etc. It does provide consistency, and makes it easier for someone who does not know this specific detail about the two ZK nodes per instance to reason about potential problems. Eseentially easier to undestand and maintain. - Siddharth ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51312/#review147362 ----------------------------------------------------------- On Sept. 1, 2016, 6:35 p.m., Sergey Shelukhin wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51312/ > ----------------------------------------------------------- > > (Updated Sept. 1, 2016, 6:35 p.m.) > > > Review request for hive and Prasanth_J. > > > Repository: hive-git > > > Description > ------- > > see jira > > > Diffs > ----- > > llap-client/pom.xml 0243340 > > llap-client/src/java/org/apache/hadoop/hive/llap/registry/ServiceInstanceSet.java > 99ead9b > > llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapFixedRegistryImpl.java > e9456f2 > > llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapZookeeperRegistryImpl.java > 64d2617 > > llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/SlotZnode.java > PRE-CREATION > > llap-client/src/java/org/apache/hadoop/hive/llap/security/LlapTokenClient.java > 921e050 > > llap-client/src/test/org/apache/hadoop/hive/llap/registry/impl/TestSlotZnode.java > PRE-CREATION > > llap-server/src/java/org/apache/hadoop/hive/llap/cli/LlapStatusServiceDriver.java > 17ce69b > > llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java > efd774d > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/Utils.java 8a4fc08 > > ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestHostAffinitySplitLocationProvider.java > 54f7363 > > Diff: https://reviews.apache.org/r/51312/diff/ > > > Testing > ------- > > > Thanks, > > Sergey Shelukhin > >