Alexey Serbin created KUDU-3585: ----------------------------------- Summary: ClientTest.ClearCacheAndConcurrentWorkload fails from time to time in TSAN builds Key: KUDU-3585 URL: https://issues.apache.org/jira/browse/KUDU-3585 Project: Kudu Issue Type: Sub-task Components: client, test Affects Versions: 1.17.0, 1.16.0, 1.15.0, 1.14.0 Reporter: Alexey Serbin
The scenario sometimes fails in TSAN builds with output like cited below. It seems the root cause was RPC queue overflows at kudu-master and kudu-tserver: both spend much more time on regular requests when built with TSAN instrumentation, and resetting the client'ss meta-cache too often induces a lot of GetTableLocations requests, and serving eats a lot of CPU and many threads are kept busy. Since an internal mini-cluster is used in the scenario (i.e. all masters and tablet servers are a part of just one process), that affects kudu-tserver RPC worker threads as well, so many requests accumulate in the RPC queues. {noformat} src/kudu/client/client-test.cc:408: Failure Expected equality of these values: 0 server->server()->rpc_server()-> service_pool("kudu.tserver.TabletServerService")-> RpcsQueueOverflowMetric()->value() Which is: 1 src/kudu/client/client-test.cc:584: Failure Expected: CheckNoRpcOverflow() doesn't generate new fatal failures in the current thread. Actual: it does. src/kudu/client/client-test.cc:2466: Failure Expected: DeleteTestRows(client_table_.get(), kLowIdx, kHighIdx) doesn't generate new fatal failures in the current thread. Actual: it does. {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)