Checked the approach to use msg.version() as a striped pool index for tx
messages processing.
Seems, this fixes the problem for cases when originating node is not a
primary (tx creation happens at the striped pool).
But, what about cases when transaction started from the user thread, not a
striped pool?

for example:
at
org.apache.ignite.internal.processors.cache.transactions.IgniteTxStateImpl.<init>(IgniteTxStateImpl.java:89)
at
org.apache.ignite.internal.processors.cache.transactions.IgniteTxLocalAdapter.<init>(IgniteTxLocalAdapter.java:208)
at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocalAdapter.<init>(GridDhtTxLocalAdapter.java:144)
at
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.<init>(GridNearTxLocal.java:285)
at
org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager.newTx(IgniteTxManager.java:787)
at
org.apache.ignite.internal.processors.cache.transactions.IgniteTransactionsImpl.txStart0(IgniteTransactionsImpl.java:187)
at
org.apache.ignite.internal.processors.cache.transactions.IgniteTransactionsImpl.txStart(IgniteTransactionsImpl.java:70)
at
org.apache.ignite.internal.processors.cache.transactions.TxMultiCacheAsyncOpsTest.testCommitAfterAsyncGet(TxMultiCacheAsyncOpsTest.java:121)

Such code generates/fills non-threadsafe collections which will be used at
the striped pool later.
Should we always create/modify transactions at the striped pool threads?

On Fri, May 19, 2023 at 8:29 PM Anton Vinogradov <a...@apache.org> wrote:

> >> This invariant is violated in many places.
> At least in half of the messages:
>
>
> org.apache.ignite.internal.processors.cache.distributed.GridDistributedLockRequest#partition
>
> org.apache.ignite.internal.processors.cache.distributed.GridDistributedUnlockRequest#partition
>
> org.apache.ignite.internal.processors.cache.distributed.GridDistributedTxFinishResponse#partition
>
> org.apache.ignite.internal.processors.cache.distributed.GridDistributedTxPrepareResponse#partition
>
> +1 to use nearXidVersion.hash() as a stripe index, but, listed methods
> made me doubt.
>
> On Fri, May 19, 2023 at 7:32 PM Ivan Daschinsky <ivanda...@gmail.com>
> wrote:
>
>> >> Tx processing is supposed to be thread bound by hashing the version to
>> a
>> partition
>> This invariant is violated in many places. The most notorious example is
>> tx
>> recovery.
>>
>> Another example: I just added an assertion that checks tId of a creator
>> thread with tId of an accessor thread.
>> TxMultiCacheAsyncOpsTest fails immediately on processing of a tx prepare
>> request. Looks like a big issue, IMO
>>
>>
>> пт, 19 мая 2023 г. в 19:11, Alexei Scherbakov <
>> alexey.scherbak...@gmail.com
>> >:
>>
>> > Tx processing is supposed to be thread bound by hashing the version to a
>> > partition, see methods like [1]
>> > If for some cases this invariant is broken, this should be fixed.
>> >
>> > [1]
>> >
>> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareRequest#partition
>> >
>> > пт, 19 мая 2023 г. в 15:57, Anton Vinogradov <a...@apache.org>:
>> >
>> > > Igniters,
>> > >
>> > > My team was faced with node failure [1] because of non-threadsafe
>> > > collections usage.
>> > >
>> > > IgniteTxStateImpl's fields
>> > > - activeCacheIds
>> > > - txMap
>> > > are not thread safe, but are widely used from different threads
>> without
>> > the
>> > > proper sync.
>> > >
>> > > The main question is ... why?
>> > >
>> > > According to the research, we have no guarantee that tx will be
>> processed
>> > > at the single thread.
>> > > It may be processed at the several! threads at the striped pool and at
>> > the
>> > > tx recovery thread as well.
>> > >
>> > > Thread at the striped pool will be selected by the message's
>> partition()
>> > > method, which can be calculated like this:
>> > > - return keys != null && !keys.isEmpty() ? keys.get(0).partition() :
>> -1;
>> > > - return U.safeAbs(version().hashCode());
>> > > - ...,
>> > > so, no guarantee it is processed at the same thread (proven by tests).
>> > >
>> > > Seems, we MAY lose the data.
>> > > For example, ignoring some or all keys from txMap at commit.
>> > >
>> > > If anyone knows why this is not a problem (I mean sync lack, not data
>> > loss)
>> > > or how to fix this properly, please give me a hint, or correct my
>> > > conclusions if necessary.
>> > >
>> > > [1] https://issues.apache.org/jira/browse/IGNITE-19445
>> > >
>> >
>> >
>> > --
>> >
>> > Best regards,
>> > Alexei Scherbakov
>> >
>>
>>
>> --
>> Sincerely yours, Ivan Daschinskiy
>>
>

Reply via email to