[ https://issues.apache.org/jira/browse/IGNITE-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14958749#comment-14958749 ]
Mark Howard commented on IGNITE-1267: ------------------------------------- We've also hit this problem. I think it's broader than the title suggests though - it's for any node that is not in the original topology, not just new nodes. In our case, we're using ignite for relatively long jobs with a very small fanout - most tasks map to a single job, on a cluster of perhaps 100 nodes. Due to the topology restrictions in the collision and failover SPIs, these can never be stolen either by a new or existing idle node. The fix is relatively easy for us - comment out the topology checks in the job stealing collision and failover SPIs. This is valid for us since our initial load balancing is relatively straightforward, based on node attributes and the same node attributes are used in the job stealing configuration. It may not be entirely generic though since it's not as powerful as the original TopologySpi which was in early versions of gridgain. Without it though the collision SPIs are pretty much useless as they stand in the 1.4 release.. (unless we've missed something!) > JobStealingCollisionSpi never sends jobs to a node that joined after task was > executed > -------------------------------------------------------------------------------------- > > Key: IGNITE-1267 > URL: https://issues.apache.org/jira/browse/IGNITE-1267 > Project: Ignite > Issue Type: Bug > Components: compute > Affects Versions: 1.1.4 > Reporter: Valentin Kulichenko > Labels: user-request > > Corresponding user thread (contains detailed description of the scenario that > doesn't work): > http://apache-ignite-users.70518.x6.nabble.com/Dynamic-ComputeTask-distribution-with-new-nodes-td997.html > Essentially, {{JobStealingCollisionSpi}} always skips jobs that are not in > task topology (see line 713). Task topology is static and created when task > is executed, so newly joined node can't steal jobs. I think it should be able > to do this if it satisfies initial cluster group predicate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)