Just some more info. I use a ContinousMapper within a task that sends off these jobs. Can this maybe be the reason?
On Fri, 30 Aug 2019 at 21:49, Pascoe Scholle <pascoescholletr...@gmail.com> wrote: > Hi Illya, > > So I have exciting news. Like you say we have to configure some of the > nodes manually and I have done this using the Collision spi and it works > great! > > I have nodes configured for more powerful pc's that also have more ram > available and limit the number of jobs that can run in parallel, really > awesome feature. > > I now have another issue. > > Say I have node A and B both with attribute "compute". So all jobs > deployed on the cluster are only executed on these two nodes. > > Node A runs on a weaker machine so I limit the number of parallel jobs to > 2. > > Node B runs on the stronger machine with 4 parallel jobs running at the > same time. > > Say I send off 100 jobs to the cluster and these are evenly distributed > between these two nodes. Node B will obviously finish its queue much faster. > > So I tried to implement a JobStealingCollision spi, but its not stealing > any jobs from Node A. > > Here is some code to show the configuration, can you point out what is > missing? > > This is the job stealer node that has job stealing enabled: > > """""""""""""""""""""""""""""""""""""""""""""""""""""" > > val jobStealer = new JobStealingCollisionSpi(); > jobStealer.setStealingEnabled(true); > > jobStealer.setStealingAttributes(Map("compute.node" -> "true")); > > /* set the number of parallel jobs allowed */ > jobStealer.setActiveJobsThreshold(10); > > // Configure stealing attempts number. > jobStealer.setMaximumStealingAttempts(10); > > val failoverSpi = new JobStealingFailoverSpi(); > > val cfg = new IgniteConfiguration(); > > val cacheConfig = new CacheConfiguration("myCache"); > > /* partition the cache over the entire cluster */ > cacheConfig.setCacheMode(CacheMode.PARTITIONED); > > cfg.setCacheConfiguration(cacheConfig); > > cfg.setMetricsUpdateFrequency(10); > > cfg.setFailoverSpi(failoverSpi); > > cfg.setPeerClassLoadingEnabled(true); > > cfg.setCollisionSpi(jobStealer); > > /* jobs are sent to this node for computation */ > cfg.setUserAttributes(Map("compute.node" -> true)); > > """""""""""""""""""""""""""""""""""""""""""" > > Next the configuration for Node B: > job stealing is disabled, so that other nodes can take jobs from this node > > """"""""""""""""""""""""""""""" > val jobStealer = new JobStealingCollisionSpi(); > > jobStealer.setStealingEnabled(false); > > /* set the number of parallel jobs allowed */ > jobStealer.setActiveJobsThreshold(2); > > jobStealer.setStealingAttributes(Map("compute.node" -> "true")); > > val failoverSpi = new JobStealingFailoverSpi(); > > val cacheConfig = new CacheConfiguration("myCache"); > > cacheConfig.setCacheMode(CacheMode.PARTITIONED); > > cfg.setCacheConfiguration(cacheConfig); > > cfg.setPeerClassLoadingEnabled(true); > > cfg.setFailoverSpi(failoverSpi); > > cfg.setCollisionSpi(jobStealer); > > cfg.setUserAttributes(Map("compute.node" -> true)); > > """""""""""""""""""""""""""""""""""""""""" > > I also have a third node reserved for running services with a different > attribute. > > Any pointers on why the JobStealer node is not stealing any jobs? > > Thanks! > > On Fri, 30 Aug 2019 at 15:02, Ilya Kasnacheev <ilya.kasnach...@gmail.com> > wrote: > >> Hello! >> >> I think you will have to do it semi-manually: on every node you should >> know how many resources are available, don't allow jobs to overconsume. >> >> You could try to use LoadBalancingSpi but as far as I have heard, it is >> not trivial. >> >> Ignite it not a scheduler, so we don't have any built-ins for resource >> control. >> >> Regards, >> -- >> Ilya Kasnacheev >> >> >> чт, 29 авг. 2019 г. в 14:27, Pascoe Scholle <pascoescholletr...@gmail.com >> >: >> >>> Hi, >>> >>> I have a question regarding the workings of the load balancer and >>> running prcesses which are outside the jvm. >>> >>> We have a python commandline tool which is used for processing big data. >>> The tool is highly optimized and is able to instantly load data into ram. >>> Some data can be as large as 20 Gb. >>> >>> When sending multiple jobs each triggering their own python process, I >>> do not want this to occur on the same machine, is there any way we can use >>> the load balancer to ensure that all jobs are evenly distributed or >>> possibly restrict certain jobs to a node running on a desktop we know has >>> enough memory available. >>> >>> For example I have two machines one which has 32 Gb of ram and the >>> second which has 64 Gb, one ignite node per machine. Three jobs are sent >>> using ComputeTaskContinuousMapper. The 32Gb machine received two tasks and >>> obviously froze up. Having some way of ensuring that two jobs are sent to >>> the machine with more memory would be really helpful. >>> >>> Thanks and kind regards, >>> Pascoe >>> >>