Stephen O'Donnell created HDDS-10682:
----------------------------------------

             Summary: EC Reconstruction creates empty chunks at the end of 
blocks with partial stripes
                 Key: HDDS-10682
                 URL: https://issues.apache.org/jira/browse/HDDS-10682
             Project: Apache Ozone
          Issue Type: Bug
            Reporter: Stephen O'Donnell
            Assignee: Stephen O'Donnell


Given an EC block that is larger than 1 full stripe, but the last stripe is 
partial so that it does not use all the index.

If one of the replicas is reconstructed that does not have any data in that 
final position, an empty chunk is written to the end of the block's chunk list.

While this does no cause any immediate problem, it can prevent further 
reconstructions that attempt to use this block, and they will fail with an 
error like:

{code}
2024-04-09 01:06:21,855 ERROR 
[ec-reconstruct-reader-TID-4]-org.apache.hadoop.hdds.scm.XceiverClientGrpc: 
Failed to execute command GetBlock on the pipeline Pipeline[ Id: 
7f6f1fc9-ed26-4e19-86b6-47435b027f6a, Nodes: 
7f6f1fc9-ed26-4e19-86b6-47435b027f6a(ccycloud-4.quasar-jyswng.root.comops.site/10.140.150.0),
 ReplicationConfig: STANDALONE/ONE, State:CLOSED, leaderId:, 
CreationTimestamp2024-04-09T01:06:21.724509Z[UTC]].
2024-04-09 01:06:21,859 INFO 
[ContainerReplicationThread-1]-org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream:
 ECBlockReconstructedStripeInputStream{conID: 10007 locID: 
113750153625610009}@756a3998: error reading [1], marked as failed
org.apache.hadoop.ozone.client.io.BadDataLocationException: 
java.io.IOException: Failed to get chunkInfo[77]: len == 0
        at 
org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.readIntoBuffer(ECBlockReconstructedStripeInputStream.java:644)
        at 
org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.lambda$loadDataBuffersFromStream$2(ECBlockReconstructedStripeInputStream.java:577)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.io.IOException: Failed to get chunkInfo[77]: len == 0
        at 
org.apache.hadoop.hdds.scm.storage.BlockInputStream.validate(BlockInputStream.java:278)
        at 
org.apache.hadoop.hdds.scm.storage.BlockInputStream.lambda$static$0(BlockInputStream.java:265)
        at 
org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithRetry(XceiverClientGrpc.java:407)
        at 
org.apache.hadoop.hdds.scm.XceiverClientGrpc.lambda$sendCommandWithTraceIDAndRetry$0(XceiverClientGrpc.java:347)
        at 
org.apache.hadoop.hdds.tracing.TracingUtil.executeInSpan(TracingUtil.java:169)
        at 
org.apache.hadoop.hdds.tracing.TracingUtil.executeInNewSpan(TracingUtil.java:149)
        at 
org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithTraceIDAndRetry(XceiverClientGrpc.java:342)
        at 
org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommand(XceiverClientGrpc.java:323)
        at 
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.getBlock(ContainerProtocolCalls.java:208)
        at 
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.lambda$getBlock$0(ContainerProtocolCalls.java:186)
        at 
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.tryEachDatanode(ContainerProtocolCalls.java:146)
        at 
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.getBlock(ContainerProtocolCalls.java:185)
        at 
org.apache.hadoop.hdds.scm.storage.BlockInputStream.getChunkInfos(BlockInputStream.java:255)
        at 
org.apache.hadoop.hdds.scm.storage.BlockInputStream.initialize(BlockInputStream.java:146)
        at 
org.apache.hadoop.hdds.scm.storage.BlockInputStream.readWithStrategy(BlockInputStream.java:308)
        at 
org.apache.hadoop.hdds.scm.storage.ExtendedInputStream.read(ExtendedInputStream.java:66)
        at 
org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.readFromCurrentLocation(ECBlockReconstructedStripeInputStream.java:655)
        at 
org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.readIntoBuffer(ECBlockReconstructedStripeInputStream.java:631)
        ... 5 more
{code}

If there are other spare replicas which can be used, reconstruction will 
continue, otherwise it will not be able to complete.

At this stage, I am not sure if this can affect reading a block via the normal 
read path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to