Alexey Serbin created KUDU-3638:
-----------------------------------
Summary: Deficiences in cleaning up tombstoned tablets lead to
flooding logs and high CPU usage at Kudu master nodes
Key: KUDU-3638
URL: https://issues.apache.org/jira/browse/KUDU-3638
Project: Kudu
Issue Type: Bug
Components: master, tserver
Affects Versions: 1.17.1
Reporter: Alexey Serbin
In the scope of implementing
[KUDU-3486|https://issues.apache.org/jira/browse/KUDU-3486], a few deficiencies
have been introduced that manifest themselves at least as the following:
* Tablet servers that host tombstoned replicas of tablets that are part of
still existing tables send reports on all of them with every incremental
heartbeat to leader master after about 30 minutes after start or as customized
by the {{\-\-tserver_send_tombstoned_tablets_report_inteval_secs}} flag
* Leader master would flood its INFO log with messages like below, adding same
records again and again upon processing every incremental heartbeat from tablet
servers like mentioned in the item above {noformat}
... catalog_manager.cc:5516] TS <ts_UUID> (<ta_node_name>:7050) does not have
the latest schema for tablet <tablet_UUID> (table <table_name>
[id=<table_UUID>]). Expected version A got B
{noformat}
As a temporary workaround for the issue, set
{{\-\-tserver_send_tombstoned_tablets_report_inteval_sec=-1}} for tablet
servers (NOTE: since the flag is runtime by its nature, it's possible to
address the issue without restarting tablet servers by using the {{kudu tserver
set_flag}} CLI tool).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)