Checked the approach to use msg.version() as a striped pool index for tx messages processing. Seems, this fixes the problem for cases when originating node is not a primary (tx creation happens at the striped pool). But, what about cases when transaction started from the user thread, not a striped pool?
for example: at org.apache.ignite.internal.processors.cache.transactions.IgniteTxStateImpl.<init>(IgniteTxStateImpl.java:89) at org.apache.ignite.internal.processors.cache.transactions.IgniteTxLocalAdapter.<init>(IgniteTxLocalAdapter.java:208) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocalAdapter.<init>(GridDhtTxLocalAdapter.java:144) at org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.<init>(GridNearTxLocal.java:285) at org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager.newTx(IgniteTxManager.java:787) at org.apache.ignite.internal.processors.cache.transactions.IgniteTransactionsImpl.txStart0(IgniteTransactionsImpl.java:187) at org.apache.ignite.internal.processors.cache.transactions.IgniteTransactionsImpl.txStart(IgniteTransactionsImpl.java:70) at org.apache.ignite.internal.processors.cache.transactions.TxMultiCacheAsyncOpsTest.testCommitAfterAsyncGet(TxMultiCacheAsyncOpsTest.java:121) Such code generates/fills non-threadsafe collections which will be used at the striped pool later. Should we always create/modify transactions at the striped pool threads? On Fri, May 19, 2023 at 8:29 PM Anton Vinogradov <a...@apache.org> wrote: > >> This invariant is violated in many places. > At least in half of the messages: > > > org.apache.ignite.internal.processors.cache.distributed.GridDistributedLockRequest#partition > > org.apache.ignite.internal.processors.cache.distributed.GridDistributedUnlockRequest#partition > > org.apache.ignite.internal.processors.cache.distributed.GridDistributedTxFinishResponse#partition > > org.apache.ignite.internal.processors.cache.distributed.GridDistributedTxPrepareResponse#partition > > +1 to use nearXidVersion.hash() as a stripe index, but, listed methods > made me doubt. > > On Fri, May 19, 2023 at 7:32 PM Ivan Daschinsky <ivanda...@gmail.com> > wrote: > >> >> Tx processing is supposed to be thread bound by hashing the version to >> a >> partition >> This invariant is violated in many places. The most notorious example is >> tx >> recovery. >> >> Another example: I just added an assertion that checks tId of a creator >> thread with tId of an accessor thread. >> TxMultiCacheAsyncOpsTest fails immediately on processing of a tx prepare >> request. Looks like a big issue, IMO >> >> >> пт, 19 мая 2023 г. в 19:11, Alexei Scherbakov < >> alexey.scherbak...@gmail.com >> >: >> >> > Tx processing is supposed to be thread bound by hashing the version to a >> > partition, see methods like [1] >> > If for some cases this invariant is broken, this should be fixed. >> > >> > [1] >> > >> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareRequest#partition >> > >> > пт, 19 мая 2023 г. в 15:57, Anton Vinogradov <a...@apache.org>: >> > >> > > Igniters, >> > > >> > > My team was faced with node failure [1] because of non-threadsafe >> > > collections usage. >> > > >> > > IgniteTxStateImpl's fields >> > > - activeCacheIds >> > > - txMap >> > > are not thread safe, but are widely used from different threads >> without >> > the >> > > proper sync. >> > > >> > > The main question is ... why? >> > > >> > > According to the research, we have no guarantee that tx will be >> processed >> > > at the single thread. >> > > It may be processed at the several! threads at the striped pool and at >> > the >> > > tx recovery thread as well. >> > > >> > > Thread at the striped pool will be selected by the message's >> partition() >> > > method, which can be calculated like this: >> > > - return keys != null && !keys.isEmpty() ? keys.get(0).partition() : >> -1; >> > > - return U.safeAbs(version().hashCode()); >> > > - ..., >> > > so, no guarantee it is processed at the same thread (proven by tests). >> > > >> > > Seems, we MAY lose the data. >> > > For example, ignoring some or all keys from txMap at commit. >> > > >> > > If anyone knows why this is not a problem (I mean sync lack, not data >> > loss) >> > > or how to fix this properly, please give me a hint, or correct my >> > > conclusions if necessary. >> > > >> > > [1] https://issues.apache.org/jira/browse/IGNITE-19445 >> > > >> > >> > >> > -- >> > >> > Best regards, >> > Alexei Scherbakov >> > >> >> >> -- >> Sincerely yours, Ivan Daschinskiy >> >