OplogCompactor race issue?

Kapil Goyal Tue, 04 Oct 2016 01:17:12 -0700

Hi All,

We have been testing a single cache node with a lot of data recently and 
frequently run into this error:


[info 2016/09/29 06:16:06.823 UTC <OplogCompactor nsxDiskStore for oplog 
oplog#6> tid=0x19] OplogCompactor for nsxDiskStore compaction oplog id(s): 
oplog#6
[info 2016/09/29 06:16:08.232 UTC <OplogCompactor nsxDiskStore for oplog 
oplog#6> tid=0x19] compaction did 6,310 creates and updates in 1,408 ms
[info 2016/09/29 06:16:08.248 UTC <Oplog Delete Task> tid=0x19] Deleted oplog#6 
crf for disk store nsxDiskStore.
[info 2016/09/29 06:16:08.256 UTC <Oplog Delete Task> tid=0x19] Deleted oplog#6 
krf for disk store nsxDiskStore.
[info 2016/09/29 06:16:08.256 UTC <Oplog Delete Task> tid=0x19] Deleted oplog#6 
drf for disk store nsxDiskStore.
[info 2016/09/29 06:17:03.887 UTC <Event Processor for 
GatewaySender_AsyncEventQueue_txLogEventQueue> tid=0x19] Created oplog#8 drf 
for disk store nsxDiskStore.
[info 2016/09/29 06:17:03.911 UTC <Event Processor for 
GatewaySender_AsyncEventQueue_txLogEventQueue> tid=0x19] Created oplog#8 crf 
for disk store nsxDiskStore.
[info 2016/09/29 06:17:04.031 UTC <Idle OplogCompactor> tid=0x19] Created 
oplog#7 krf for disk store nsxDiskStore.
[info 2016/09/29 06:17:04.314 UTC <OplogCompactor nsxDiskStore for oplog 
oplog#7> tid=0x19] OplogCompactor for nsxDiskStore compaction oplog id(s): 
oplog#7
[error 2016/09/29 06:17:16.075 UTC <OplogCompactor nsxDiskStore for oplog 
oplog#7> tid=0x19] A DiskAccessException has occurred while writing to the disk 
for disk store nsxDiskStore. The cache will be closed. 
?com.gemstone.gemfire.cache.DiskAccessException: For DiskStore: nsxDiskStore: 
Failed writing key to "/common/nsxapi/data/self/BACKUPnsxDiskStore_7", caused 
by java.io.IOException: Stream Closed ?at 
com.gemstone.gemfire.internal.cache.Oplog.flushAll(Oplog.java:5235)

>From the logs it appears there may be a race between threads "Idle 
>OplogCompactor" and "OplogCompactor nsxDiskStore for oplog oplog#7". I see 
>that both are doing operations related to oplog#7. The former logs creation of 
>a KRF file, while the latter is trying to access either the DRF or the CRF 
>file. Now, is it possible that "Idle OplogCompactor" closed the DRF/CRF files 
>for oplog#7 as part of creating the KRF for the same? This is what GemFire 
>docs say about it:

"After the oplog is closed, GemFire also attempts to create a krf file, which 
contains the key names as well as the offset for the value within the crf file."

Based on the above, it's possible that oplog#7 was already closed and its KRF 
was already created, when the compactor tried to access the files.

Have any of you run into this error before? Any suggestions?

Thanks
Kapil

OplogCompactor race issue?

Reply via email to