Re: Async cache groups rebalance not started with rebalanceOrder ZERO

Maxim Muzafarov Tue, 17 Jul 2018 09:02:13 -0700

Yakov,

But we already have DFLT_SEND_RETRY_CNT and DFLT_SEND_RETRY_DELAY for
configuring our CommunicationSPI behavior. What if user configure this
parameters his own way and he will see a lot of WARN messages in log which
have no sense?


May be we use GridCachePartitionExchangeManager#forceRebalance (or may
be forceReassign) if we fail rebalance all that retries. What do you think?



пн, 16 июл. 2018 г. в 21:12, Yakov Zhdanov <yzhda...@gridgain.com>:

> Maxim, I looked at the code you provided. I think we need to add some
> timeout validation and output warning to logs on demander side in case
> there is no supply message within 30 secs and repeat demanding process.
> This should apply to any demand message throughout the rebalancing process
> not only the 1st one.
>
> You can use the following message
>
> Failed to wait for supply message from node within 30 secs [cache=C,
> partId=XX]
>
> Alex Goncharuk do you have comments here?
>
> Yakov Zhdanov
> www.gridgain.com
>
> 2018-07-14 19:45 GMT+03:00 Maxim Muzafarov <maxmu...@gmail.com>:
>
> > Yakov,
> >
> > Yes, you're right. Whole rebalancing progress will be stopped.
> >
> > Actually, rebalancing order doesn't matter you right it too. Javadoc just
> > says the idea how rebalance should work for caches but in fact it don't
> > work as described. Personally, I'd prefer to start rebalance of each
> cache
> > group in async way independently.
> >
> > Please, look at my reproducer [1].
> >
> > Scenario:
> > Cluster with two REPLICATEDED caches.
> > Start new node.
> > First rebalance cache group is failed to start (e.g. network issues) -
> it's
> > OK.
> > Second rebalance cache group will neber be started - whole futher
> progress
> > stucks (I think rebalance here should be started!).
> >
> >
> > [1]
> > https://github.com/Mmuzaf/ignite/blob/rebalance-cancel/
> > modules/core/src/test/java/org/apache/ignite/internal/
> > processors/cache/distributed/rebalancing/GridCacheRebalancingCancelSelf
> > Test.java
> >
> > пт, 13 июл. 2018 г. в 17:46, Yakov Zhdanov <yzhda...@apache.org>:
> >
> > > Maxim, I do not understand the problem. Imagine I do not have any
> > ordering
> > > but rebalancing of some cache fails to start - so in my understanding
> > > overall rebalancing progress becomes blocked. Is that true?
> > >
> > > Can you pleaes provide reproducer for your problem?
> > >
> > > --Yakov
> > >
> > > 2018-07-09 16:42 GMT+03:00 Maxim Muzafarov <maxmu...@gmail.com>:
> > >
> > > > Hello Igniters,
> > > >
> > > > Each cache group has “rebalance order” property. As javadoc for
> > > > getRebalanceOrder() says: “Note that cache with order {@code 0} does
> > not
> > > > participate in ordering. This means that cache with rebalance order
> > > {@code
> > > > 0} will never wait for any other caches. All caches with order {@code
> > 0}
> > > > will be rebalanced right away concurrently with each other and
> ordered
> > > > rebalance processes. If not set, cache order is 0, i.e. rebalancing
> is
> > > not
> > > > ordered.”
> > > >
> > > > In fact GridCachePartitionExchangeManager always build the chain of
> > > > rebalancing cache groups to start (even for cache order ZERO):
> > > >
> > > > ignite-sys-cache -> cacheR -> cacheR3 -> cacheR2 -> cacheR5 ->
> cacheR1.
> > > >
> > > > If one of these groups will fail to start further groups will never
> be
> > > run.
> > > >
> > > > * Question 1*: Should we fix javadoc description or create a bug for
> > > fixing
> > > > such rebalance behavior?
> > > >
> > > > [1]
> > > > https://github.com/apache/ignite/blob/master/modules/
> > > > core/src/main/java/org/apache/ignite/internal/processors/cache/
> > > > GridCachePartitionExchangeManager.java#L2630
> > > >
> > >
> > --
> > --
> > Maxim Muzafarov
> >
>
-- 
--
Maxim Muzafarov

Re: Async cache groups rebalance not started with rebalanceOrder ZERO

Reply via email to