Wei-Chiu Chuang created HDDS-10497:
--------------------------------------

             Summary: [hsync] Refresh block token immediately if block token 
expires
                 Key: HDDS-10497
                 URL: https://issues.apache.org/jira/browse/HDDS-10497
             Project: Apache Ozone
          Issue Type: Sub-task
            Reporter: Wei-Chiu Chuang


HDDS-9734 and HDDS-7930 improves error handling when input stream fails to read 
due to expired block token. But it only refreshes block token after retry every 
datanode in the pipeline, which not only adds log spew but also increase 99.9% 
tail latency.

The input stream should request new block token immediately after an expired 
block token.

Relevant logs:

{noformat}
2024-03-08 23:03:20,109 WARN 
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls: Failed to read chunk 
113750153625603061_chunk_1 (len=1048576) conID: 4 locID: 113750153625603061 
bcsId: 129941 from 
5fa1d092-1f11-4f6e-af4a-cf2785a8cae4(ccycloud-1.weichiu-hbase.root.comops.site/10.140.131.133);
 will try another datanode.
org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: 
BLOCK_TOKEN_VERIFICATION_FAILED for null: Expired token for user: hbase 
(auth:SIMPLE)
        at 
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:675)
        at 
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.lambda$createValidators$4(ContainerProtocolCalls.java:686)
        at 
org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithRetry(XceiverClientGrpc.java:400)
        at 
org.apache.hadoop.hdds.scm.XceiverClientGrpc.lambda$sendCommandWithTraceIDAndRetry$0(XceiverClientGrpc.java:340)
        at 
org.apache.hadoop.hdds.tracing.TracingUtil.executeInSpan(TracingUtil.java:159)
        at 
org.apache.hadoop.hdds.tracing.TracingUtil.executeInNewSpan(TracingUtil.java:149)
        at 
org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithTraceIDAndRetry(XceiverClientGrpc.java:335)
        at 
org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommand(XceiverClientGrpc.java:316)
        at 
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.readChunk(ContainerProtocolCalls.java:358)
        at 
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.lambda$readChunk$2(ContainerProtocolCalls.java:345)
        at 
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.tryEachDatanode(ContainerProtocolCalls.java:147)
        at 
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.readChunk(ContainerProtocolCalls.java:344)
        at 
org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunk(ChunkInputStream.java:425)
        at 
org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunkDataIntoBuffers(ChunkInputStream.java:402)
        at 
org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunkFromContainer(ChunkInputStream.java:387)
        at 
org.apache.hadoop.hdds.scm.storage.ChunkInputStream.prepareRead(ChunkInputStream.java:319)
        at 
org.apache.hadoop.hdds.scm.storage.ChunkInputStream.read(ChunkInputStream.java:173)
        at 
org.apache.hadoop.hdds.scm.storage.ByteArrayReader.readFromBlock(ByteArrayReader.java:54)
        at 
org.apache.hadoop.hdds.scm.storage.BlockInputStream.readWithStrategy(BlockInputStream.java:367)
...

2024-03-08 23:03:20,112 ERROR org.apache.hadoop.hdds.scm.XceiverClientGrpc: 
Failed to execute command ReadChunk on the pipeline Pipeline[ Id: 
04646212-c013-4f8c-9ada-80580c189135, Nodes: 
5fa1d092-1f11-4f6e-af4a-cf2785a8cae4(ccycloud-1.weichiu-hbase.root.comops.site/10.140.131.133)98e5528d-c790-465e-91e0-f47d4cabe3bc(ccycloud-3.weichiu-hbase.root.comops.site/10.140.103.18)0238996a-9361-4b83-aaa8-e99fd9523ad0(ccycloud-2.weichiu-hbase.root.comops.site/10.140.135.20),
 ReplicationConfig: STANDALONE/THREE, State:OPEN, 
leaderId:98e5528d-c790-465e-91e0-f47d4cabe3bc, 
CreationTimestamp2024-03-08T18:59:15.755Z[UTC]].

2024-03-08 23:03:20,113 WARN 
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls: Failed to read chunk 
113750153625603061_chunk_1 (len=1048576) conID: 4 locID: 113750153625603061 
bcsId: 129941 from 
98e5528d-c790-465e-91e0-f47d4cabe3bc(ccycloud-3.weichiu-hbase.root.comops.site/10.140.103.18);
 will try another datanode.
org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: 
BLOCK_TOKEN_VERIFICATION_FAILED for null: Expired token for user: hbase 
(auth:SIMPLE)
        at 
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:675)
        at 
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.lambda$createValidators$4(ContainerProtocolCalls.java:686)
        at 
org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithRetry(XceiverClientGrpc.java:400)
        at 
org.apache.hadoop.hdds.scm.XceiverClientGrpc.lambda$sendCommandWithTraceIDAndRetry$0(XceiverClientGrpc.java:340)
        at 
org.apache.hadoop.hdds.tracing.TracingUtil.executeInSpan(TracingUtil.java:159)
        at 
org.apache.hadoop.hdds.tracing.TracingUtil.executeInNewSpan(TracingUtil.java:149)
        at 
org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithTraceIDAndRetry(XceiverClientGrpc.java:335)
        at 
org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommand(XceiverClientGrpc.java:316)
        at 
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.readChunk(ContainerProtocolCalls.java:358)
        at 
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.lambda$readChunk$2(ContainerProtocolCalls.java:345)
        at 
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.tryEachDatanode(ContainerProtocolCalls.java:147)
        at 
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.readChunk(ContainerProtocolCalls.java:344)
...
2024-03-08 23:03:20,116 ERROR org.apache.hadoop.hdds.scm.XceiverClientGrpc: 
Failed to execute command ReadChunk on the pipeline Pipeline[ Id: 
04646212-c013-4f8c-9ada-80580c189135, Nodes: 
5fa1d092-1f11-4f6e-af4a-cf2785a8cae4(ccycloud-1.weichiu-hbase.root.comops.site/10.140.131.133)98e5528d-c790-465e-91e0-f47d4cabe3bc(ccycloud-3.weichiu-hbase.root.comops.site/10.140.103.18)0238996a-9361-4b83-aaa8-e99fd9523ad0(ccycloud-2.weichiu-hbase.root.comops.site/10.140.135.20),
 ReplicationConfig: STANDALONE/THREE, State:OPEN, 
leaderId:98e5528d-c790-465e-91e0-f47d4cabe3bc, 
CreationTimestamp2024-03-08T18:59:15.755Z[UTC]].
2024-03-08 23:03:20,390 INFO 
org.apache.hadoop.hdds.scm.storage.BlockInputStream: Unable to read information 
for block conID: 3 locID: 113750153625603098 bcsId: 459126 from pipeline 
PipelineID=eb1d2690-75a6-48d7-9eec-60675b907fc0: 
BLOCK_TOKEN_VERIFICATION_FAILED for null: Expired token for user: hbase 
(auth:SIMPLE)
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to