Hi Ozone Devs,

I am currently working on  HDDS-7364
<https://issues.apache.org/jira/browse/HDDS-7364> to get Ozone's container
scanner to a point where it can be enabled by default. The container
scanner will check container block data and metadata in the background to
identify corruption, mark containers unhealthy, and notify SCM so a healthy
replica can be copied.

One of the subtasks is HDDS-8062
<https://issues.apache.org/jira/browse/HDDS-8062>, which is to provide a
way to track why containers were marked unhealthy, and persist that
information so it can be referenced a while later. Datanode application
logs can roll too frequently for this purpose, so I propose adding a new
log to the datanode to track container replica state transitions. This log
would provide useful debugging insight not just for the scanner, but for
any other replica related issues that may originate on the datanodes. The
design doc is attached to HDDS-8062
<https://issues.apache.org/jira/browse/HDDS-8062> and here
<https://issues.apache.org/jira/secure/attachment/13058801/container_log_v1.pdf>
is a link as well.

This will add a new log and new debugging capabilities to Ozone. Please
review and provide any feedback on this thread or the jira.

Thanks.
Ethan

Reply via email to