Hi Sankalp, As long as you have replication, I’ve found it is safest to delete entire topic-partition directories than it is to delete individual log segments from them. For one, you get back more space. Second, you don’t have to worry about metadata corruption.
When I’ve run out of disk space in the past, the first thing I did was reduce topic retention where I could, waited for the log cleanup routines to run, then I looked for and deleted associated topic partition directories on the brokers with filled disks before starting kafka on them. When the brokers rejoined the cluster, they started catching up on the deleted topic-partition directories. -- Peter Bukowinski > On Mar 25, 2021, at 8:00 AM, Sankalp Bhatia <sankalpbhati...@gmail.com> wrote: > > Hi All, > > Brokers in one of our Apache Kafka clusters are continuously crashing as > they have run out of disk space. As per my understanding, reducing the > value of retention.ms and retention.bytes properties will not work because > the broker is crashing before the log-retention thread can be scheduled ( > link > <https://github.com/apache/kafka/blob/3eaf44ba8ea26a7a820894390e8877d404ddd5a2/core/src/main/scala/kafka/log/LogManager.scala#L394-L398> > ). > One option we are exploring is if we can manually delete some of the old > segment files to make some space in our data disk for the broker to startup > while reducing the retention.ms config at the same time. There is an old > email thread (link > <https://mail-archives.apache.org/mod_mbox/kafka-users/201403.mbox/%3CCAOG_4Qbwx44T-=vrpkvqgrum8lpmdzl2bxxrgz5c9h1_noh...@mail.gmail.com%3E>) > which suggests it is safe to do so, but we want to understand if there have > been recent changes to topic-partition metadata which we might end up > corrupting if we try this? If so, are there any tips to get around this > issue? > > Thanks, > Sankalp