Hi All, We have been testing a single cache node with a lot of data recently and frequently run into this error:
[info 2016/09/29 06:16:06.823 UTC <OplogCompactor nsxDiskStore for oplog oplog#6> tid=0x19] OplogCompactor for nsxDiskStore compaction oplog id(s): oplog#6 [info 2016/09/29 06:16:08.232 UTC <OplogCompactor nsxDiskStore for oplog oplog#6> tid=0x19] compaction did 6,310 creates and updates in 1,408 ms [info 2016/09/29 06:16:08.248 UTC <Oplog Delete Task> tid=0x19] Deleted oplog#6 crf for disk store nsxDiskStore. [info 2016/09/29 06:16:08.256 UTC <Oplog Delete Task> tid=0x19] Deleted oplog#6 krf for disk store nsxDiskStore. [info 2016/09/29 06:16:08.256 UTC <Oplog Delete Task> tid=0x19] Deleted oplog#6 drf for disk store nsxDiskStore. [info 2016/09/29 06:17:03.887 UTC <Event Processor for GatewaySender_AsyncEventQueue_txLogEventQueue> tid=0x19] Created oplog#8 drf for disk store nsxDiskStore. [info 2016/09/29 06:17:03.911 UTC <Event Processor for GatewaySender_AsyncEventQueue_txLogEventQueue> tid=0x19] Created oplog#8 crf for disk store nsxDiskStore. [info 2016/09/29 06:17:04.031 UTC <Idle OplogCompactor> tid=0x19] Created oplog#7 krf for disk store nsxDiskStore. [info 2016/09/29 06:17:04.314 UTC <OplogCompactor nsxDiskStore for oplog oplog#7> tid=0x19] OplogCompactor for nsxDiskStore compaction oplog id(s): oplog#7 [error 2016/09/29 06:17:16.075 UTC <OplogCompactor nsxDiskStore for oplog oplog#7> tid=0x19] A DiskAccessException has occurred while writing to the disk for disk store nsxDiskStore. The cache will be closed. ?com.gemstone.gemfire.cache.DiskAccessException: For DiskStore: nsxDiskStore: Failed writing key to "/common/nsxapi/data/self/BACKUPnsxDiskStore_7", caused by java.io.IOException: Stream Closed ?at com.gemstone.gemfire.internal.cache.Oplog.flushAll(Oplog.java:5235) >From the logs it appears there may be a race between threads "Idle >OplogCompactor" and "OplogCompactor nsxDiskStore for oplog oplog#7". I see >that both are doing operations related to oplog#7. The former logs creation of >a KRF file, while the latter is trying to access either the DRF or the CRF >file. Now, is it possible that "Idle OplogCompactor" closed the DRF/CRF files >for oplog#7 as part of creating the KRF for the same? This is what GemFire >docs say about it: "After the oplog is closed, GemFire also attempts to create a krf file, which contains the key names as well as the offset for the value within the crf file." Based on the above, it's possible that oplog#7 was already closed and its KRF was already created, when the compactor tried to access the files. Have any of you run into this error before? Any suggestions? Thanks Kapil