[ https://issues.apache.org/jira/browse/GEODE-10395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mario Kevo resolved GEODE-10395. -------------------------------- Fix Version/s: 1.16.0 Resolution: Fixed > TXLockIdImpl memory leak after CommitConflictException from another node > ------------------------------------------------------------------------ > > Key: GEODE-10395 > URL: https://issues.apache.org/jira/browse/GEODE-10395 > Project: Geode > Issue Type: Bug > Affects Versions: 1.14.0, 1.15.0 > Reporter: Eugene Nedzvetsky > Assignee: Mario Kevo > Priority: Major > Labels: pull-request-available > Fix For: 1.16.0 > > > org.apache.geode.internal.cache.locks.TXLockServiceImpl#txLock:120 adds > TXLockIdImpl objects to TXLockServiceImpl#txLockIdList. > {code:java} > synchronized (txLockIdList) { > txLockId = new TXLockIdImpl(dlock.getDistributionManager().getId()); > txLockIdList.add(txLockId); > } > {code} > These objects will not be removed from this list if dlock.acquireTryLocks > returned false. > {code:java} > gotLocks = dlock.acquireTryLocks(batch, TIMEOUT_MILLIS, LEASE_MILLIS, > keyIfFail); > if (gotLocks) { // ...otherwise race can occur between tryLocks and > readLock > acquireRecoveryReadLock(); > } else if (keyIfFail[0] != null) { > throw new CommitConflictException( > String.format("Concurrent transaction commit detected %s", > keyIfFail[0])); > } else { > throw new CommitConflictException( > String.format("Failed to request try locks from grantor: %s", > dlock.getLockGrantorId())); > } > {code} > It throws CommitConflictException and after that system doesn't have a > txLockId reference and this txLockId will be never removed from this list. > It produces critical performance degradation. txLockIdList has a few hundred > thousand txLocks after a few weeks and TXLockServiceImpl#release iterates > this list 2 times on every tx commit and the same time "synchronized > (txLockIdList)" locks other threads. > TXLockIdImpl#equals works really slow because it checks bunch of variables in > memberId.equals(that.memberId). > {code:java} > public void release(TXLockId txLockId) { > synchronized (txLockIdList) { > if (!txLockIdList.contains(txLockId)) { > // TXLockService.destroyServices can be invoked in cache.close(). > // Other P2P threads could process message such as TXCommitMessage > afterwards, > // and invoke TXLockService.createDTLS(). It could create a new > TXLockService > // which will have a new empty list (txLockIdList) and it will not > // contain the originally added txLockId > throw new IllegalArgumentException( > String.format("Invalid txLockId not found: %s", > txLockId)); > } > dlock.releaseTryLocks(txLockId, () -> { > return recovering; > }); > txLockIdList.remove(txLockId); > releaseRecoveryReadLock(); > } > } > {code} > I think TXLockServiceImpl#txLock should remove this txLockId from > TXLockServiceImpl#txLockIdList in case of CommitConflictException: > {code:java} > if (gotLocks) { // ...otherwise race can occur between tryLocks and readLock > acquireRecoveryReadLock(); > } else if (keyIfFail[0] != null) { > synchronized (this.txLockIdList) { > this.txLockIdList.remove(txLockId); > } > throw new CommitConflictException( > String.format("Concurrent transaction commit detected > %s", > keyIfFail[0])); > } else { > synchronized (this.txLockIdList) { > this.txLockIdList.remove(txLockId); > } > throw new CommitConflictException( > String.format("Failed to request try locks from > grantor: %s", > this.dlock.getLockGrantorId())); > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)