[ 
https://issues.apache.org/jira/browse/HBASE-28589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhenyuLi updated HBASE-28589:
-----------------------------
    Affects Version/s: 2.6.0
                       2.5.0
                       2.4.0
                       3.0.0
                           (was: 1.2.0)
                           (was: 1.3.0)
                           (was: 1.4.0)
                           (was: 1.5.0)
             Priority: Major  (was: Minor)

> Client Does not Stop Retrying after DoNotRetryException
> -------------------------------------------------------
>
>                 Key: HBASE-28589
>                 URL: https://issues.apache.org/jira/browse/HBASE-28589
>             Project: HBase
>          Issue Type: Bug
>          Components: IPC/RPC
>    Affects Versions: 2.0.0, 2.4.0, 2.5.0, 2.6.0, 3.0.0
>            Reporter: ZhenyuLi
>            Priority: Major
>
> I recently discovered that the fix for HBase-14598 does not completely 
> resolve the issue. Their fix addressed two aspects: first, when the Scan/Get 
> RPC attempts to allocate a very large array that could potentially lead to an 
> out-of-memory (OOM) error, it will check the size of the array before 
> allocation and directly throw an exception to prevent the region server from 
> crashing and avoid possible cascading failures. Second, the developer intends 
> for the client to stop retrying after such a failure, as retrying will not 
> resolve the issue.
> However, their fix involved throwing a DoNotRetryException. After 
> ByteBufferOutputStream.write throws the DoNotRetryException, in the call 
> stack (ByteBufferOutputStream.write --> encoder.write --> encodeCellsTo --> 
> his.cellBlockBuilder.buildCellBlockStream --> call.setResponse), the 
> DoNotRetryException is ultimately caught in the CallRunner.run function, with 
> only a log printed. Consequently, the DoNotRetryException is not sent back to 
> the client side. Instead, the client receives a generic exception for the 
> failed RPC request and continues retrying, which is not the desired behavior. 
> I have reproduced this on the cluster.
> In the code of CallRunner, it is obvious that the DoNotRetryException in 
> call.setResponse will be swallowed in the error handler with just a LOG 
> printed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to