[PR] HDDS-10682. EC Reconstruction creates empty chunks at the end of blocks with partial stripes [ozone]

via GitHub Thu, 11 Apr 2024 13:36:38 -0700


sodonnel opened a new pull request, #6515:
URL: https://github.com/apache/ozone/pull/6515


   ## What changes were proposed in this pull request?
   
   
   
   Given an EC block that is larger than 1 full stripe, but the last stripe is 
partial so that it does not use all the index.
   
   If one of the replicas is reconstructed that does not have any data in that 
final position, an empty chunk is written to the end of the block's chunk list.
   
   While this does no cause any immediate problem, it can prevent further 
reconstructions that attempt to use this block, and they will fail with an 
error like:
   
   ```
   2024-04-09 01:06:21,855 ERROR 
[ec-reconstruct-reader-TID-4]-org.apache.hadoop.hdds.scm.XceiverClientGrpc: 
Failed to execute command GetBlock on the pipeline Pipeline[ Id: 
7f6f1fc9-ed26-4e19-86b6-47435b027f6a, Nodes: 
7f6f1fc9-ed26-4e19-86b6-47435b027f6a(ccycloud-4.quasar-jyswng.root.comops.site/10.140.150.0),
 ReplicationConfig: STANDALONE/ONE, State:CLOSED, leaderId:, 
CreationTimestamp2024-04-09T01:06:21.724509Z[UTC]].
   2024-04-09 01:06:21,859 INFO 
[ContainerReplicationThread-1]-org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream:
 ECBlockReconstructedStripeInputStream{conID: 10007 locID: 
113750153625610009}@756a3998: error reading [1], marked as failed
   org.apache.hadoop.ozone.client.io.BadDataLocationException: 
java.io.IOException: Failed to get chunkInfo[77]: len == 0
           at 
org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.readIntoBuffer(ECBlockReconstructedStripeInputStream.java:644)
           at 
org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.lambda$loadDataBuffersFromStream$2(ECBlockReconstructedStripeInputStream.java:577)
           at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
           at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
           at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
           at java.base/java.lang.Thread.run(Thread.java:834)
   Caused by: java.io.IOException: Failed to get chunkInfo[77]: len == 0
           at 
org.apache.hadoop.hdds.scm.storage.BlockInputStream.validate(BlockInputStream.java:278)
           at 
org.apache.hadoop.hdds.scm.storage.BlockInputStream.lambda$static$0(BlockInputStream.java:265)
           at 
org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithRetry(XceiverClientGrpc.java:407)
           at 
org.apache.hadoop.hdds.scm.XceiverClientGrpc.lambda$sendCommandWithTraceIDAndRetry$0(XceiverClientGrpc.java:347)
           at 
org.apache.hadoop.hdds.tracing.TracingUtil.executeInSpan(TracingUtil.java:169)
           at 
org.apache.hadoop.hdds.tracing.TracingUtil.executeInNewSpan(TracingUtil.java:149)
           at 
org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithTraceIDAndRetry(XceiverClientGrpc.java:342)
           at 
org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommand(XceiverClientGrpc.java:323)
           at 
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.getBlock(ContainerProtocolCalls.java:208)
           at 
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.lambda$getBlock$0(ContainerProtocolCalls.java:186)
           at 
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.tryEachDatanode(ContainerProtocolCalls.java:146)
           at 
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.getBlock(ContainerProtocolCalls.java:185)
           at 
org.apache.hadoop.hdds.scm.storage.BlockInputStream.getChunkInfos(BlockInputStream.java:255)
           at 
org.apache.hadoop.hdds.scm.storage.BlockInputStream.initialize(BlockInputStream.java:146)
           at 
org.apache.hadoop.hdds.scm.storage.BlockInputStream.readWithStrategy(BlockInputStream.java:308)
           at 
org.apache.hadoop.hdds.scm.storage.ExtendedInputStream.read(ExtendedInputStream.java:66)
           at 
org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.readFromCurrentLocation(ECBlockReconstructedStripeInputStream.java:655)
           at 
org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.readIntoBuffer(ECBlockReconstructedStripeInputStream.java:631)
           ... 5 more
   ```
   
   If there are other spare replicas which can be used, reconstruction will 
continue, otherwise it will not be able to complete.
   
   At this stage, I am not sure if this can affect reading a block via the 
normal read path.
   
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-10682
   
   ## How was this patch tested?
   
   Added a unit test to reproduce and then fixed the issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] HDDS-10682. EC Reconstruction creates empty chunks at the end of blocks with partial stripes [ozone]

Reply via email to