Dmitry, Nikolay, thanks for your help! Since the changes are merged we are able to implement this feature in Compatibility Testing Framework.
It provides us to work flexibly with multi-version cluster and implement new testing scenarios (I know that Ignite doesn't have such feature yet). Does it make sense to create a ticket? On Wed, Mar 21, 2018 at 8:16 PM, Dmitry Pavlov <dpavlov....@gmail.com> wrote: > Hi Nickolay, > > it seems we have lazy consesus here. > > Failures: tests 11 suites 1, all these tests are failed in master. > > Could you merge? > > Sincerely, > Dmitriy Pavlov > > пт, 16 мар. 2018 г. в 18:29, Nikolay Izhikov <nizhi...@apache.org>: > >> Hello, Guys. >> >> I'm reviewed changes and it looks good to me. >> There is a simple reproducer for a bug in test framework, see below. >> >> It fails in master and works in branch. >> >> I'm planning to merge the fix [1] if Run All will be OK. >> >> Please, write to me if you have any objections. >> >> [1] https://github.com/apache/ignite/pull/2382 >> >> ``` >> public class MultiJvmSelfTest extends GridCommonAbstractTest { >> @Override protected boolean isMultiJvm() { return true; } >> >> public void testGrid() throws Exception { >> final IgniteInternalFuture fut = GridTestUtils.runAsync(new >> RunnableX() { >> @Override public void runx() throws Exception { >> try { >> startGrid(0); >> startGrid(1); >> } >> finally { >> stopGrid(1); >> stopGrid(0); >> } >> } >> }); >> >> try { >> fut.get(20_000L); >> } finally { >> stopAllGrids(true); >> } >> } >> } >> ``` >> >> В Чт, 15/03/2018 в 15:59 +0000, Dmitry Pavlov пишет: >> > I see now. Thank you. >> > >> > Nikolay, could you please merge this change? >> > >> > чт, 15 мар. 2018 г. в 18:48, Vyacheslav Daradur <daradu...@gmail.com>: >> > >> > > In brief: >> > > Nodes in *separate* JVMs are shutting down by the computing task >> > > *StopGridTask* which has sent from *local* JVM *synchronously* that >> > > means *local* node must wait for task's finish. >> > > >> > > At the same time when a node in *separate* JVM executes the received >> > > *StopGridTask* which *synchronously* calls *G.stop(igniteInstanceName, >> > > FALSE)* which is waiting for all computing task's finish, including >> > > *StopGridTask* which has invoked it. >> > > >> > > We have some kind of deadlock: >> > > *Local* node is waiting for the computing task's finish which is >> > > waiting for finish of execution *G.stop* which is waiting for all >> > > computing tasks finish including *StopGridTask*. >> > > >> > > We have not noticed that before because we use only stopAllGrids() in >> > > out tests which stop local JVM without waiting for nodes in other >> > > JVMs. >> > > >> > > >> > > >> > > On Thu, Mar 15, 2018 at 6:11 PM, Dmitry Pavlov <dpavlov....@gmail.com> >> > > wrote: >> > > > Please address comments in PR. >> > > > >> > > > I did not fully understood why sync GridStopMessage message was >> lost, but >> > > > async will be successfull. Probably we need discuss it briefly. >> > > > >> > > > чт, 1 мар. 2018 г. в 12:11, Vyacheslav Daradur <daradu...@gmail.com >> >: >> > > > > >> > > > > Thank you, Dmitry! >> > > > > >> > > > > I'll join this review soon. >> > > > > >> > > > > On Thu, Mar 1, 2018 at 12:07 PM, Dmitry Pavlov < >> dpavlov....@gmail.com> >> > > > > wrote: >> > > > > > Hi Vyacheslav, >> > > > > > >> > > > > > I will take a look, but first of all I am going to review >> > > > > > https://reviews.ignite.apache.org/ignite/review/IGNT-CR-502 - >> it is >> > > > > > impact >> > > > > > change in testing framework. Hope you also will join to this >> review . >> > > > > > >> > > > > > Sincerely, >> > > > > > Dmitiry Pavlov >> > > > > > >> > > > > > >> > > > > > чт, 1 мар. 2018 г. в 11:13, Vyacheslav Daradur < >> daradu...@gmail.com>: >> > > > > > > >> > > > > > > Hi, Dmitry, could you please review it, because you are one of >> the >> > > > > > > most experienced people in the testing framework. >> > > > > > > >> > > > > > > Please see comment in Jira, because it is in pretty-format >> there. >> > > > > > > >> > > > > > > On Thu, Feb 22, 2018 at 11:56 AM, Vyacheslav Daradur >> > > > > > > <daradu...@gmail.com> wrote: >> > > > > > > > Hi Igniters! >> > > > > > > > >> > > > > > > > I have investigated the issue [1] and found that stopping >> node in >> > > > > > > > separate JVM may stuck thread or leave system process alive >> after >> > > > > > > > test >> > > > > > > > finished. >> > > > > > > > The main reason is *StopGridTask* that we send from node in >> local >> > > >> > > JVM >> > > > > > > > to node in separate JVM via remote computing. >> > > > > > > > We send job synchronously to be sure that node will be >> stopped, but >> > > > > > > > job calls synchronously *G.stop(igniteInstanceName, >> cancel))* with >> > > > > > > > *cancel = false*, that means node must wait to compute jobs >> before >> > > >> > > it >> > > > > > > > goes down what leads to some kind of deadlock. Using of >> *cancel = >> > > > > > > > true* would solve the issue but may break some tests’ logic, >> for >> > > >> > > this >> > > > > > > > reason, I've reworked the method’s synchronization logic [2]. >> > > > > > > > >> > > > > > > > We have not noticed that before because we use only >> > > >> > > *stopAllGrids()* >> > > > > > > > in out tests which stop local JVM without waiting for nodes >> in >> > > >> > > other >> > > > > > > > JVMs. >> > > > > > > > I believe this fix should reduce the number of flaky tests on >> > > > > > > > TeamCity, especially which fails because of a cluster from >> the >> > > > > > > > previous test has not been stopped properly. >> > > > > > > > >> > > > > > > > Ci.tests [3] look a bit better than in master. >> > > > > > > > Please review prepared PR [2] and share your thoughts. >> > > > > > > > >> > > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-5910 >> > > > > > > > [2] https://github.com/apache/ignite/pull/2382 >> > > > > > > > [3] >> https://ci.ignite.apache.org/viewLog.html?buildId=1105939 >> > > > > > > > >> > > > > > > > >> > > > > > > > On Fri, Aug 4, 2017 at 11:41 AM, Vyacheslav Daradur >> > > > > > > > <daradu...@gmail.com> wrote: >> > > > > > > > > Hi Igniters, >> > > > > > > > > >> > > > > > > > > Working on my task I found a bug at call the method >> > > >> > > #stopGrid(name), >> > > > > > > > > it produced ClassCastException. I created a ticket[1]. >> > > > > > > > > >> > > > > > > > > After it was fixed[2] I saw that nodes which was started >> in a >> > > > > > > > > separate >> > > > > > > > > JVM >> > > > > > > > > could stay in process of operation system. >> > > > > > > > > It was fixed too, but not sure is it fixed in proper way >> or not. >> > > > > > > > > >> > > > > > > > > Could someone review it? >> > > > > > > > > >> > > > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-5910 >> > > > > > > > > [2] https://github.com/apache/ignite/pull/2382 >> > > > > > > > > >> > > > > > > > > -- >> > > > > > > > > Best Regards, Vyacheslav D. >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > -- >> > > > > > > > Best Regards, Vyacheslav D. >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > -- >> > > > > > > Best Regards, Vyacheslav D. >> > > > > >> > > > > >> > > > > >> > > > > -- >> > > > > Best Regards, Vyacheslav D. >> > > >> > > >> > > >> > > -- >> > > Best Regards, Vyacheslav D. >> > > -- Best Regards, Vyacheslav D.