[ https://issues.apache.org/jira/browse/KUDU-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexey Serbin updated KUDU-3585: -------------------------------- Code Review: https://gerrit.cloudera.org/#/c/21523/ > ClientTest.ClearCacheAndConcurrentWorkload fails from time to time in TSAN > builds > --------------------------------------------------------------------------------- > > Key: KUDU-3585 > URL: https://issues.apache.org/jira/browse/KUDU-3585 > Project: Kudu > Issue Type: Sub-task > Components: client, test > Affects Versions: 1.14.0, 1.15.0, 1.16.0, 1.17.0 > Reporter: Alexey Serbin > Priority: Major > > The scenario sometimes fails in TSAN builds with output like cited below. > It seems the root cause was RPC queue overflows at kudu-master and > kudu-tserver: both spend much more time on regular requests when built with > TSAN instrumentation, and resetting the client'ss meta-cache too often > induces a lot of GetTableLocations requests, and serving eats a lot of CPU > and many threads are kept busy. Since an internal mini-cluster is used in > the scenario (i.e. all masters and tablet servers are a part of just one > process), that affects kudu-tserver RPC worker threads as well, so many > requests accumulate in the RPC queues. > {noformat} > src/kudu/client/client-test.cc:408: Failure > Expected equality of these values: 0 > > server->server()->rpc_server()-> > service_pool("kudu.tserver.TabletServerService")-> > RpcsQueueOverflowMetric()->value() > Which is: 1 > src/kudu/client/client-test.cc:584: Failure > Expected: CheckNoRpcOverflow() doesn't generate new fatal failures in the > current thread. > Actual: it does. > > src/kudu/client/client-test.cc:2466: Failure > Expected: DeleteTestRows(client_table_.get(), kLowIdx, kHighIdx) doesn't > generate new fatal failures in the current thread. > Actual: it does. > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)