Alexey Serbin created KUDU-3585:
-----------------------------------

             Summary: ClientTest.ClearCacheAndConcurrentWorkload fails from 
time to time in TSAN builds
                 Key: KUDU-3585
                 URL: https://issues.apache.org/jira/browse/KUDU-3585
             Project: Kudu
          Issue Type: Sub-task
          Components: client, test
    Affects Versions: 1.17.0, 1.16.0, 1.15.0, 1.14.0
            Reporter: Alexey Serbin


The scenario sometimes fails in TSAN builds with output like cited below.

It seems the root cause was RPC queue overflows at kudu-master and 
kudu-tserver: both spend much more time on regular requests when built with 
TSAN instrumentation, and resetting the client'ss meta-cache too often induces 
a lot of GetTableLocations requests, and serving eats a lot of CPU and many 
threads are kept busy.  Since an internal mini-cluster is used in the scenario 
(i.e. all masters and tablet servers are a part of just one process), that 
affects kudu-tserver RPC worker threads as well, so many requests accumulate in 
the RPC queues.

{noformat}
src/kudu/client/client-test.cc:408: Failure
Expected equality of these values: 0                                            
                                 
  server->server()->rpc_server()-> 
service_pool("kudu.tserver.TabletServerService")-> 
RpcsQueueOverflowMetric()->value()
    Which is: 1
src/kudu/client/client-test.cc:584: Failure
Expected: CheckNoRpcOverflow() doesn't generate new fatal failures in the 
current thread. 
  Actual: it does.                                                              
src/kudu/client/client-test.cc:2466: Failure
Expected: DeleteTestRows(client_table_.get(), kLowIdx, kHighIdx) doesn't 
generate new fatal failures in the current thread.
  Actual: it does.  
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to