Igniters, My team was faced with node failure [1] because of non-threadsafe collections usage.
IgniteTxStateImpl's fields - activeCacheIds - txMap are not thread safe, but are widely used from different threads without the proper sync. The main question is ... why? According to the research, we have no guarantee that tx will be processed at the single thread. It may be processed at the several! threads at the striped pool and at the tx recovery thread as well. Thread at the striped pool will be selected by the message's partition() method, which can be calculated like this: - return keys != null && !keys.isEmpty() ? keys.get(0).partition() : -1; - return U.safeAbs(version().hashCode()); - ..., so, no guarantee it is processed at the same thread (proven by tests). Seems, we MAY lose the data. For example, ignoring some or all keys from txMap at commit. If anyone knows why this is not a problem (I mean sync lack, not data loss) or how to fix this properly, please give me a hint, or correct my conclusions if necessary. [1] https://issues.apache.org/jira/browse/IGNITE-19445