Dmitry, Nikolay, thanks for your help!

Since the changes are merged we are able to implement this feature in
Compatibility Testing Framework.

It provides us to work flexibly with multi-version cluster and
implement new testing scenarios (I know that Ignite doesn't have such
feature yet).

Does it make sense to create a ticket?


On Wed, Mar 21, 2018 at 8:16 PM, Dmitry Pavlov <dpavlov....@gmail.com> wrote:
> Hi Nickolay,
>
> it seems we have lazy consesus here.
>
> Failures:  tests 11 suites 1, all these tests are failed in master.
>
> Could you merge?
>
> Sincerely,
> Dmitriy Pavlov
>
> пт, 16 мар. 2018 г. в 18:29, Nikolay Izhikov <nizhi...@apache.org>:
>
>> Hello, Guys.
>>
>> I'm reviewed changes and it looks good to me.
>> There is a simple reproducer for a bug in test framework, see below.
>>
>> It fails in master and works in branch.
>>
>> I'm planning to merge the fix [1] if Run All will be OK.
>>
>> Please, write to me if you have any objections.
>>
>> [1] https://github.com/apache/ignite/pull/2382
>>
>> ```
>> public class MultiJvmSelfTest extends GridCommonAbstractTest {
>>     @Override protected boolean isMultiJvm() { return true; }
>>
>>     public void testGrid() throws Exception {
>>         final IgniteInternalFuture fut = GridTestUtils.runAsync(new
>> RunnableX() {
>>             @Override public void runx() throws Exception {
>>                 try {
>>                     startGrid(0);
>>                     startGrid(1);
>>                 }
>>                 finally {
>>                     stopGrid(1);
>>                     stopGrid(0);
>>                 }
>>             }
>>         });
>>
>>         try {
>>             fut.get(20_000L);
>>         } finally {
>>             stopAllGrids(true);
>>         }
>>     }
>> }
>> ```
>>
>> В Чт, 15/03/2018 в 15:59 +0000, Dmitry Pavlov пишет:
>> > I see now. Thank you.
>> >
>> > Nikolay, could you please merge this change?
>> >
>> > чт, 15 мар. 2018 г. в 18:48, Vyacheslav Daradur <daradu...@gmail.com>:
>> >
>> > > In brief:
>> > > Nodes in *separate* JVMs are shutting down by the computing task
>> > > *StopGridTask* which has sent from *local* JVM *synchronously* that
>> > > means *local* node must wait for task's finish.
>> > >
>> > > At the same time when a node in *separate* JVM executes the received
>> > > *StopGridTask* which *synchronously* calls *G.stop(igniteInstanceName,
>> > > FALSE)* which is waiting for all computing task's finish, including
>> > > *StopGridTask* which has invoked it.
>> > >
>> > > We have some kind of deadlock:
>> > > *Local* node is waiting for the computing task's finish which is
>> > > waiting for finish of execution *G.stop* which is waiting for all
>> > > computing tasks finish including *StopGridTask*.
>> > >
>> > > We have not noticed that before because we use only stopAllGrids() in
>> > > out tests which stop local JVM without waiting for nodes in other
>> > > JVMs.
>> > >
>> > >
>> > >
>> > > On Thu, Mar 15, 2018 at 6:11 PM, Dmitry Pavlov <dpavlov....@gmail.com>
>> > > wrote:
>> > > > Please address comments in PR.
>> > > >
>> > > > I did not fully understood why sync GridStopMessage message was
>> lost, but
>> > > > async will be successfull. Probably we need discuss it briefly.
>> > > >
>> > > > чт, 1 мар. 2018 г. в 12:11, Vyacheslav Daradur <daradu...@gmail.com
>> >:
>> > > > >
>> > > > > Thank you, Dmitry!
>> > > > >
>> > > > > I'll join this review soon.
>> > > > >
>> > > > > On Thu, Mar 1, 2018 at 12:07 PM, Dmitry Pavlov <
>> dpavlov....@gmail.com>
>> > > > > wrote:
>> > > > > > Hi Vyacheslav,
>> > > > > >
>> > > > > > I will take a look, but first of all I am going to review
>> > > > > > https://reviews.ignite.apache.org/ignite/review/IGNT-CR-502  -
>> it is
>> > > > > > impact
>> > > > > > change in testing framework. Hope you also will join to this
>> review .
>> > > > > >
>> > > > > > Sincerely,
>> > > > > > Dmitiry Pavlov
>> > > > > >
>> > > > > >
>> > > > > > чт, 1 мар. 2018 г. в 11:13, Vyacheslav Daradur <
>> daradu...@gmail.com>:
>> > > > > > >
>> > > > > > > Hi, Dmitry, could you please review it, because you are one of
>> the
>> > > > > > > most experienced people in the testing framework.
>> > > > > > >
>> > > > > > > Please see comment in Jira, because it is in pretty-format
>> there.
>> > > > > > >
>> > > > > > > On Thu, Feb 22, 2018 at 11:56 AM, Vyacheslav Daradur
>> > > > > > > <daradu...@gmail.com> wrote:
>> > > > > > > > Hi Igniters!
>> > > > > > > >
>> > > > > > > > I have investigated the issue [1] and found that stopping
>> node in
>> > > > > > > > separate JVM may stuck thread or leave system process alive
>> after
>> > > > > > > > test
>> > > > > > > > finished.
>> > > > > > > > The main reason is *StopGridTask* that we send from node in
>> local
>> > >
>> > > JVM
>> > > > > > > > to node in separate JVM via remote computing.
>> > > > > > > > We send job synchronously to be sure that node will be
>> stopped, but
>> > > > > > > > job calls synchronously *G.stop(igniteInstanceName,
>> cancel))* with
>> > > > > > > > *cancel = false*, that means node must wait to compute jobs
>> before
>> > >
>> > > it
>> > > > > > > > goes down what leads to some kind of deadlock. Using of
>> *cancel =
>> > > > > > > > true* would solve the issue but may break some tests’ logic,
>> for
>> > >
>> > > this
>> > > > > > > > reason, I've reworked the method’s synchronization logic [2].
>> > > > > > > >
>> > > > > > > > We have not noticed that before because we use only
>> > >
>> > > *stopAllGrids()*
>> > > > > > > > in out tests which stop local JVM without waiting for nodes
>> in
>> > >
>> > > other
>> > > > > > > > JVMs.
>> > > > > > > > I believe this fix should reduce the number of flaky tests on
>> > > > > > > > TeamCity, especially which fails because of a cluster from
>> the
>> > > > > > > > previous test has not been stopped properly.
>> > > > > > > >
>> > > > > > > > Ci.tests [3] look a bit better than in master.
>> > > > > > > > Please review prepared PR [2] and share your thoughts.
>> > > > > > > >
>> > > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-5910
>> > > > > > > > [2] https://github.com/apache/ignite/pull/2382
>> > > > > > > > [3]
>> https://ci.ignite.apache.org/viewLog.html?buildId=1105939
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > On Fri, Aug 4, 2017 at 11:41 AM, Vyacheslav Daradur
>> > > > > > > > <daradu...@gmail.com> wrote:
>> > > > > > > > > Hi Igniters,
>> > > > > > > > >
>> > > > > > > > > Working on my task I found a bug at call the method
>> > >
>> > > #stopGrid(name),
>> > > > > > > > > it produced ClassCastException. I created a ticket[1].
>> > > > > > > > >
>> > > > > > > > > After it was fixed[2] I saw that nodes which was started
>> in a
>> > > > > > > > > separate
>> > > > > > > > > JVM
>> > > > > > > > > could stay in process of operation system.
>> > > > > > > > > It was fixed too, but not sure is it fixed in proper way
>> or not.
>> > > > > > > > >
>> > > > > > > > > Could someone review it?
>> > > > > > > > >
>> > > > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-5910
>> > > > > > > > > [2] https://github.com/apache/ignite/pull/2382
>> > > > > > > > >
>> > > > > > > > > --
>> > > > > > > > > Best Regards, Vyacheslav D.
>> > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > --
>> > > > > > > > Best Regards, Vyacheslav D.
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > --
>> > > > > > > Best Regards, Vyacheslav D.
>> > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > Best Regards, Vyacheslav D.
>> > >
>> > >
>> > >
>> > > --
>> > > Best Regards, Vyacheslav D.
>> > >



-- 
Best Regards, Vyacheslav D.

Reply via email to