Hello Kafka Team , I'm facing an issue with *Kafka* running in *KRaft mode* (without ZooKeeper) on a *Windows* environment, and I keep getting an AccessDeniedException in the logs when Kafka attempts to delete specific snapshot files.
[2024-09-14 11:28:22,849] ERROR Error deleting snapshot files C:\tmp\kraft-combined-logs\__cluster_metadata-0\00000000000000943744-0000000001.checkpoint and C:\tmp\kraft-combined-logs\__cluster_metadata-0\00000000000000943744-0000000001.checkpoint.deleted (org.apache.kafka.snapshot.Snapshots) java.nio.file.AccessDeniedException: C:\tmp\kraft-combined-logs\__cluster_metadata-0\00000000000000943744-0000000001.checkpoint.deleted at java.base/sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:89) *Environment Details*: - *Kafka version*: Using KRaft mode (without ZooKeeper) - *OS*: Windows 10 - *Log retention settings*: - log.retention.bytes=1073741824 (1 GB) - log.segment.bytes=1073741824 (1 GB) - log.retention.hours=24 - log.retention.check.interval.ms=300000 (5 minutes) *Issue Description*: The issue seems to occur when Kafka attempts to *delete snapshot files* in the *metadata log directory*. The files appear to be *locked or still in use*, resulting in the AccessDeniedException. This issue eventually leads to Kafka becoming unstable and crashing, as it's unable to manage log files effectively. *What I’ve Tried So Far*: 1. *Configuration Adjustments*: - I tried increasing log.segment.bytes and log.retention.bytes to *2 GB* to reduce the frequency of segment rotation and deletion. This didn't resolve the issue, and the AccessDeniedException still occurs during the retention check interval. 2. *Manual File Deletion*: - I manually stopped Kafka, deleted the snapshot files that were causing the error, and restarted Kafka. This temporarily resolves the issue, but it reoccurs after some time when Kafka again tries to delete files automatically. 3. *File Lock Investigation*: - I checked using *Process Explorer* (Windows tool) to see if any other processes were locking the files, but it seems that only Kafka is accessing them. 4. *Kafka Shutdown and Restart*: - I ensured that Kafka was fully shut down before restarting it to release any file locks, but the issue persists. *Relevant Kafka Configuration*: Here are the key configurations I’m using that are related to log segments and retention: properties Copy code # Log segment size (rotate after reaching 1 GB) log.segment.bytes=1073741824 # 1 GB # Total size-based retention (delete logs after total size exceeds 1 GB) log.retention.bytes=1073741824 # 1 GB # Time-based retention (delete logs after 24 hours) log.retention.hours=24 # Retention check interval (check logs every 5 minutes for deletion)log.retention.check.interval.ms=300000 # 5 minutes *Questions*: 1. *Is there any specific reason why Kafka might still be holding file locks on these snapshot files, even after segment rotation?* 2. *Could this issue be related to the log retention policies in KRaft mode, and are there any specific settings I should tweak to avoid Kafka trying to delete files that are still in use?* 3. *Has anyone else encountered this AccessDeniedException on Windows, and what steps did you take to resolve it?* 4. *Would increasing the log.retention.check.interval.ms <http://log.retention.check.interval.ms> to a larger value (e.g., 15 minutes) help Kafka release file locks before trying to delete files?* *Any Suggestions or Workarounds*: Any help would be appreciated, as this issue is affecting the stability of our Kafka setup. I'm open to adjusting Kafka settings, investigating file handling issues in Windows, or any other potential solutions that could resolve this. ------------------------------ Thanks in advance! Regards, Nishant Chauhan Senior DevOps Engineer, Xcelore +91 999 <+919999816624>1408040 | www.xcelore.com nishant.chau...@xcelore.com <gopesh.ve...@xcelore.com> A-47, Sector 2, NOIDA, UP - 201301 <https://maps.google.com/?q=A-47,+Sector+2,+NOIDA,+UP+-+201301>