[ 
https://issues.apache.org/jira/browse/KUDU-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Serbin updated KUDU-3585:
--------------------------------
    Code Review: https://gerrit.cloudera.org/#/c/21523/

> ClientTest.ClearCacheAndConcurrentWorkload fails from time to time in TSAN 
> builds
> ---------------------------------------------------------------------------------
>
>                 Key: KUDU-3585
>                 URL: https://issues.apache.org/jira/browse/KUDU-3585
>             Project: Kudu
>          Issue Type: Sub-task
>          Components: client, test
>    Affects Versions: 1.14.0, 1.15.0, 1.16.0, 1.17.0
>            Reporter: Alexey Serbin
>            Priority: Major
>
> The scenario sometimes fails in TSAN builds with output like cited below.
> It seems the root cause was RPC queue overflows at kudu-master and 
> kudu-tserver: both spend much more time on regular requests when built with 
> TSAN instrumentation, and resetting the client'ss meta-cache too often 
> induces a lot of GetTableLocations requests, and serving eats a lot of CPU 
> and many threads are kept busy.  Since an internal mini-cluster is used in 
> the scenario (i.e. all masters and tablet servers are a part of just one 
> process), that affects kudu-tserver RPC worker threads as well, so many 
> requests accumulate in the RPC queues.
> {noformat}
> src/kudu/client/client-test.cc:408: Failure
> Expected equality of these values: 0                                          
>                                    
>   server->server()->rpc_server()-> 
> service_pool("kudu.tserver.TabletServerService")-> 
> RpcsQueueOverflowMetric()->value()
>     Which is: 1
> src/kudu/client/client-test.cc:584: Failure
> Expected: CheckNoRpcOverflow() doesn't generate new fatal failures in the 
> current thread. 
>   Actual: it does.                                                            
>   
> src/kudu/client/client-test.cc:2466: Failure
> Expected: DeleteTestRows(client_table_.get(), kLowIdx, kHighIdx) doesn't 
> generate new fatal failures in the current thread.
>   Actual: it does.  
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to