Hi Igniters! I have investigated the issue [1] and found that stopping node in separate JVM may stuck thread or leave system process alive after test finished. The main reason is *StopGridTask* that we send from node in local JVM to node in separate JVM via remote computing. We send job synchronously to be sure that node will be stopped, but job calls synchronously *G.stop(igniteInstanceName, cancel))* with *cancel = false*, that means node must wait to compute jobs before it goes down what leads to some kind of deadlock. Using of *cancel = true* would solve the issue but may break some tests’ logic, for this reason, I've reworked the method’s synchronization logic [2].
We have not noticed that before because we use only *stopAllGrids()* in out tests which stop local JVM without waiting for nodes in other JVMs. I believe this fix should reduce the number of flaky tests on TeamCity, especially which fails because of a cluster from the previous test has not been stopped properly. Ci.tests [3] look a bit better than in master. Please review prepared PR [2] and share your thoughts. [1] https://issues.apache.org/jira/browse/IGNITE-5910 [2] https://github.com/apache/ignite/pull/2382 [3] https://ci.ignite.apache.org/viewLog.html?buildId=1105939 On Fri, Aug 4, 2017 at 11:41 AM, Vyacheslav Daradur <daradu...@gmail.com> wrote: > Hi Igniters, > > Working on my task I found a bug at call the method #stopGrid(name), > it produced ClassCastException. I created a ticket[1]. > > After it was fixed[2] I saw that nodes which was started in a separate JVM > could stay in process of operation system. > It was fixed too, but not sure is it fixed in proper way or not. > > Could someone review it? > > [1] https://issues.apache.org/jira/browse/IGNITE-5910 > [2] https://github.com/apache/ignite/pull/2382 > > -- > Best Regards, Vyacheslav D. -- Best Regards, Vyacheslav D.