Hello, We are still stuck with this issue where 2.11-1.1.0 distro is failing to cleanup logs on Windows and brings the entire cluster down one by one. Extending the retention hours and sizes don't help because they burden the hard drive.
Here is the log [2018-05-12 21:36:57,673] INFO [Log partition=test-0, dir=C:\kafka1] Rolled > new log segment at offset 45 in 105 ms. (kafka.log.Log) > [2018-05-12 21:36:57,673] INFO [Log partition=test-0, dir=C:\kafka1] > Scheduling log segment [baseOffset 0, size 2290] for deletion. > (kafka.log.Log) > [2018-05-12 21:36:57,673] ERROR Error while deleting segments for test-0 > in dir C:\kafka1 (kafka.server.LogDirFailureChannel) > java.nio.file.FileSystemException: > C:\kafka1\test-0\00000000000000000000.log -> > C:\kafka1\test-0\00000000000000000000.log.deleted: The process cannot > access the file because it is being used by another process. > > at > sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86) > at > sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97) > at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:387) > at > sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:287) > at java.nio.file.Files.move(Files.java:1395) > at > org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:697) > at > org.apache.kafka.common.record.FileRecords.renameTo(FileRecords.java:212) > at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:415) > at kafka.log.Log.kafka$log$Log$$asyncDeleteSegment(Log.scala:1601) > at kafka.log.Log.kafka$log$Log$$deleteSegment(Log.scala:1588) > at > kafka.log.Log$$anonfun$deleteSegments$1$$anonfun$apply$mcI$sp$1.apply(Log.scala:1170) > at > kafka.log.Log$$anonfun$deleteSegments$1$$anonfun$apply$mcI$sp$1.apply(Log.scala:1170) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at > scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at > kafka.log.Log$$anonfun$deleteSegments$1.apply$mcI$sp(Log.scala:1170) > at kafka.log.Log$$anonfun$deleteSegments$1.apply(Log.scala:1161) > at kafka.log.Log$$anonfun$deleteSegments$1.apply(Log.scala:1161) > at kafka.log.Log.maybeHandleIOException(Log.scala:1678) > at kafka.log.Log.deleteSegments(Log.scala:1161) > at kafka.log.Log.deleteOldSegments(Log.scala:1156) > at kafka.log.Log.deleteRetentionMsBreachedSegments(Log.scala:1228) > at kafka.log.Log.deleteOldSegments(Log.scala:1222) > at > kafka.log.LogManager$$anonfun$cleanupLogs$3.apply(LogManager.scala:854) > at > kafka.log.LogManager$$anonfun$cleanupLogs$3.apply(LogManager.scala:852) > at > scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733) > at scala.collection.immutable.List.foreach(List.scala:392) > at > scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732) > at kafka.log.LogManager.cleanupLogs(LogManager.scala:852) > at > kafka.log.LogManager$$anonfun$startup$1.apply$mcV$sp(LogManager.scala:385) > at > kafka.utils.KafkaScheduler$$anonfun$1.apply$mcV$sp(KafkaScheduler.scala:110) > at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:62) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Suppressed: java.nio.file.FileSystemException: > C:\kafka1\test-0\00000000000000000000.log -> > C:\kafka1\test-0\00000000000000000000.log.deleted: The process cannot > access the file because it is being used by another process. > > at > sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86) > at > sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97) > at > sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:301) > at > sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:287) > at java.nio.file.Files.move(Files.java:1395) > at > org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:694) > ... 32 more > [2018-05-12 21:36:57,689] INFO [ReplicaManager broker=1] Stopping serving > replicas in dir C:\kafka1 (kafka.server.ReplicaManager) > [2018-05-12 21:36:57,689] ERROR Uncaught exception in scheduled task > 'kafka-log-retention' (kafka.utils.KafkaScheduler) > org.apache.kafka.common.errors.KafkaStorageException: Error while deleting > segments for test-0 in dir C:\kafka1 > Caused by: java.nio.file.FileSystemException: > C:\kafka1\test-0\00000000000000000000.log -> > C:\kafka1\test-0\00000000000000000000.log.deleted: The process cannot > access the file because it is being used by another process. > > at > sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86) > at > sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97) > at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:387) > at > sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:287) > at java.nio.file.Files.move(Files.java:1395) > at > org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:697) > at > org.apache.kafka.common.record.FileRecords.renameTo(FileRecords.java:212) > at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:415) > at kafka.log.Log.kafka$log$Log$$asyncDeleteSegment(Log.scala:1601) > at kafka.log.Log.kafka$log$Log$$deleteSegment(Log.scala:1588) > at > kafka.log.Log$$anonfun$deleteSegments$1$$anonfun$apply$mcI$sp$1.apply(Log.scala:1170) > at > kafka.log.Log$$anonfun$deleteSegments$1$$anonfun$apply$mcI$sp$1.apply(Log.scala:1170) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at > scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at > kafka.log.Log$$anonfun$deleteSegments$1.apply$mcI$sp(Log.scala:1170) > at kafka.log.Log$$anonfun$deleteSegments$1.apply(Log.scala:1161) > at kafka.log.Log$$anonfun$deleteSegments$1.apply(Log.scala:1161) > at kafka.log.Log.maybeHandleIOException(Log.scala:1678) > at kafka.log.Log.deleteSegments(Log.scala:1161) > at kafka.log.Log.deleteOldSegments(Log.scala:1156) > at kafka.log.Log.deleteRetentionMsBreachedSegments(Log.scala:1228) > at kafka.log.Log.deleteOldSegments(Log.scala:1222) > at > kafka.log.LogManager$$anonfun$cleanupLogs$3.apply(LogManager.scala:854) > at > kafka.log.LogManager$$anonfun$cleanupLogs$3.apply(LogManager.scala:852) > at > scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733) > at scala.collection.immutable.List.foreach(List.scala:392) > at > scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732) > at kafka.log.LogManager.cleanupLogs(LogManager.scala:852) > at > kafka.log.LogManager$$anonfun$startup$1.apply$mcV$sp(LogManager.scala:385) > at > kafka.utils.KafkaScheduler$$anonfun$1.apply$mcV$sp(KafkaScheduler.scala:110) > at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:62) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Suppressed: java.nio.file.FileSystemException: > C:\kafka1\test-0\00000000000000000000.log -> > C:\kafka1\test-0\00000000000000000000.log.deleted: The process cannot > access the file because it is being used by another process. > > at > sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86) > at > sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97) > at > sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:301) > at > sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:287) > at java.nio.file.Files.move(Files.java:1395) > at > org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:694) > ... 32 more > [2018-05-12 21:36:57,689] INFO [ReplicaFetcherManager on broker 1] Removed > fetcher for partitions > __consumer_offsets-22,__consumer_offsets-30,__consumer_offsets-8,__consumer_offsets-21,__consumer_offsets-4,__consumer_offsets-27,__consumer_offsets-7,__consumer_offsets-9,__consumer_offsets-46,__consumer_offsets-25,__consumer_offsets-35,__consumer_offsets-41,__consumer_offsets-33,__consumer_offsets-23,__consumer_offsets-49,__consumer_offsets-47,__consumer_offsets-16,test-0,__consumer_offsets-28,__consumer_offsets-31,__consumer_offsets-36,__consumer_offsets-42,__consumer_offsets-3,__consumer_offsets-18,test1-0,__consumer_offsets-37,__consumer_offsets-15,__consumer_offsets-24,__consumer_offsets-38,__consumer_offsets-17,__consumer_offsets-48,__consumer_offsets-19,__consumer_offsets-11,__consumer_offsets-13,__consumer_offsets-2,__consumer_offsets-43,__consumer_offsets-6,__consumer_offsets-14,__consumer_offsets-20,__consumer_offsets-0,__consumer_offsets-44,__consumer_offsets-39,__consumer_offsets-12,__consumer_offsets-45,__consumer_offsets-1,__consumer_offsets-5,__consumer_offsets-26,__consumer_offsets-29,__consumer_offsets-34,__consumer_offsets-10,__consumer_offsets-32,__consumer_offsets-40 > (kafka.server.ReplicaFetcherManager) > [2018-05-12 21:36:57,689] INFO [ReplicaAlterLogDirsManager on broker 1] > Removed fetcher for partitions > __consumer_offsets-22,__consumer_offsets-30,__consumer_offsets-8,__consumer_offsets-21,__consumer_offsets-4,__consumer_offsets-27,__consumer_offsets-7,__consumer_offsets-9,__consumer_offsets-46,__consumer_offsets-25,__consumer_offsets-35,__consumer_offsets-41,__consumer_offsets-33,__consumer_offsets-23,__consumer_offsets-49,__consumer_offsets-47,__consumer_offsets-16,test-0,__consumer_offsets-28,__consumer_offsets-31,__consumer_offsets-36,__consumer_offsets-42,__consumer_offsets-3,__consumer_offsets-18,test1-0,__consumer_offsets-37,__consumer_offsets-15,__consumer_offsets-24,__consumer_offsets-38,__consumer_offsets-17,__consumer_offsets-48,__consumer_offsets-19,__consumer_offsets-11,__consumer_offsets-13,__consumer_offsets-2,__consumer_offsets-43,__consumer_offsets-6,__consumer_offsets-14,__consumer_offsets-20,__consumer_offsets-0,__consumer_offsets-44,__consumer_offsets-39,__consumer_offsets-12,__consumer_offsets-45,__consumer_offsets-1,__consumer_offsets-5,__consumer_offsets-26,__consumer_offsets-29,__consumer_offsets-34,__consumer_offsets-10,__consumer_offsets-32,__consumer_offsets-40 > (kafka.server.ReplicaAlterLogDirsManager) > [2018-05-12 21:36:57,751] INFO [ReplicaManager broker=1] Broker 1 stopped > fetcher for partitions > __consumer_offsets-22,__consumer_offsets-30,__consumer_offsets-8,__consumer_offsets-21,__consumer_offsets-4,__consumer_offsets-27,__consumer_offsets-7,__consumer_offsets-9,__consumer_offsets-46,__consumer_offsets-25,__consumer_offsets-35,__consumer_offsets-41,__consumer_offsets-33,__consumer_offsets-23,__consumer_offsets-49,__consumer_offsets-47,__consumer_offsets-16,test-0,__consumer_offsets-28,__consumer_offsets-31,__consumer_offsets-36,__consumer_offsets-42,__consumer_offsets-3,__consumer_offsets-18,test1-0,__consumer_offsets-37,__consumer_offsets-15,__consumer_offsets-24,__consumer_offsets-38,__consumer_offsets-17,__consumer_offsets-48,__consumer_offsets-19,__consumer_offsets-11,__consumer_offsets-13,__consumer_offsets-2,__consumer_offsets-43,__consumer_offsets-6,__consumer_offsets-14,__consumer_offsets-20,__consumer_offsets-0,__consumer_offsets-44,__consumer_offsets-39,__consumer_offsets-12,__consumer_offsets-45,__consumer_offsets-1,__consumer_offsets-5,__consumer_offsets-26,__consumer_offsets-29,__consumer_offsets-34,__consumer_offsets-10,__consumer_offsets-32,__consumer_offsets-40 > and stopped moving logs for partitions because they are in the failed log > directory C:\kafka1. (kafka.server.ReplicaManager) > [2018-05-12 21:36:57,751] INFO Stopping serving logs in dir C:\kafka1 > (kafka.log.LogManager) > [2018-05-12 21:36:57,767] ERROR Shutdown broker because all log dirs in > C:\kafka1 have failed (kafka.log.LogManager) > Could someone please share some ideas how to rectify this on Windows? If this will never be supported on Windows, could we get some official communication perhaps? Regards,