noAfraidStart opened a new issue, #13020:
URL: https://github.com/apache/lucene/issues/13020
### Description
We are using HDFS for file storage and the softUpdateDocuments interface for
writing data.
We have found that during concurrent writes, the dvd files selected for
merging can be deleted by other write/flush threads
If we change to updateDocuments interface for writing data,
FileNotFoundException will not occur.
we test lucene-9.5.0 to lucene-9.8.0, all theses version will occur this
exception.
The exception as follows:
java.io.FileNotFoundException: File does not exist:
/search/test/1/index/_l5_1_Lucene90_0.dvd
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2308)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:800)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:479)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at
org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1403)
at
org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1390)
at
org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1379)
at
org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:366)
at
org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:284)
at
org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1299)
at
org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1245)
at
org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1224)
at
org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:1405)
at
org.apache.hadoop.hdfs.DFSInputStream.doPread(DFSInputStream.java:1831)
at
org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1785)
at
org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1773)
at
org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:124)
at
org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:117)
at org.apache.lucene.store.DataInput.readBytes(DataInput.java:72)
at
org.apache.lucene.store.ChecksumIndexInput.skipByReading(ChecksumIndexInput.java:79)
at
org.apache.lucene.store.ChecksumIndexInput.seek(ChecksumIndexInput.java:64)
at
org.apache.lucene.codecs.CodecUtil.checksumEntireFile(CodecUtil.java:618)
at
org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer.checkIntegrity(Lucene90DocValuesProducer.java:1640)
at
org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsReader.checkIntegrity(PerFieldDocValuesFormat.java:380)
at
org.apache.lucene.index.SegmentDocValuesProducer.checkIntegrity(SegmentDocValuesProducer.java:131)
Caused by: org.apache.hadoop.ipc.RemoteException: File does not exist:
/search/test/1/index/_l5_1_Lucene90_0.dvd
at
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:125)
at
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:115)
at
org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:205)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem$1.doCall(FSNamesystem.java:2304)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem$1.doCall(FSNamesystem.java:2301)
at
org.apache.hadoop.hdfs.server.namenode.LinkResolver.resolve(LinkResolver.java:43)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2308)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:800)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:479)
### Version and environment details
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]