Hi Ted, I highly appreciate the response over the weekend, and thanks for pointing out the JIRAs.
I don't believe the processes are responsible, but individual threads which are still holding the log/index files using IO streams. I am trying walk a single node setup through debugger to find out which thread is locking the file. Apologise but it's a huge application so it might get me some time to get around it :) Please do update if you find something new. Regards, On 12 May 2018 at 22:15, Ted Yu <yuzhih...@gmail.com> wrote: > There are two outstanding issues: KAFKA-6059 and KAFKA-6200 which bear some > resemblance. > > Can you try to find how the other process uses the file being deleted ? > > https://superuser.com/questions/117902/find-out- > which-process-is-locking-a-file-or-folder-in-windows > https://www.computerhope.com/issues/ch000714.htm > > Cheers > > On Sat, May 12, 2018 at 1:42 PM, M. Manna <manme...@gmail.com> wrote: > > > Hello, > > > > We are still stuck with this issue where 2.11-1.1.0 distro is failing to > > cleanup logs on Windows and brings the entire cluster down one by one. > > Extending the retention hours and sizes don't help because they burden > the > > hard drive. > > > > Here is the log > > > > [2018-05-12 21:36:57,673] INFO [Log partition=test-0, dir=C:\kafka1] > Rolled > > > new log segment at offset 45 in 105 ms. (kafka.log.Log) > > > [2018-05-12 21:36:57,673] INFO [Log partition=test-0, dir=C:\kafka1] > > > Scheduling log segment [baseOffset 0, size 2290] for deletion. > > > (kafka.log.Log) > > > [2018-05-12 21:36:57,673] ERROR Error while deleting segments for > test-0 > > > in dir C:\kafka1 (kafka.server.LogDirFailureChannel) > > > java.nio.file.FileSystemException: > > > C:\kafka1\test-0\00000000000000000000.log -> > > > C:\kafka1\test-0\00000000000000000000.log.deleted: The process cannot > > > access the file because it is being used by another process. > > > > > > at > > > sun.nio.fs.WindowsException.translateToIOException( > > WindowsException.java:86) > > > at > > > sun.nio.fs.WindowsException.rethrowAsIOException( > > WindowsException.java:97) > > > at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:387) > > > at > > > sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider. > > java:287) > > > at java.nio.file.Files.move(Files.java:1395) > > > at > > > org.apache.kafka.common.utils.Utils.atomicMoveWithFallback( > > Utils.java:697) > > > at > > > org.apache.kafka.common.record.FileRecords.renameTo( > > FileRecords.java:212) > > > at kafka.log.LogSegment.changeFileSuffixes(LogSegment. > scala:415) > > > at kafka.log.Log.kafka$log$Log$$asyncDeleteSegment(Log.scala: > > 1601) > > > at kafka.log.Log.kafka$log$Log$$deleteSegment(Log.scala:1588) > > > at > > > kafka.log.Log$$anonfun$deleteSegments$1$$anonfun$ > > apply$mcI$sp$1.apply(Log.scala:1170) > > > at > > > kafka.log.Log$$anonfun$deleteSegments$1$$anonfun$ > > apply$mcI$sp$1.apply(Log.scala:1170) > > > at > > > scala.collection.mutable.ResizableArray$class.foreach( > > ResizableArray.scala:59) > > > at > > > scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > > > at > > > kafka.log.Log$$anonfun$deleteSegments$1.apply$mcI$sp(Log.scala:1170) > > > at kafka.log.Log$$anonfun$deleteSegments$1.apply(Log. > scala:1161) > > > at kafka.log.Log$$anonfun$deleteSegments$1.apply(Log. > scala:1161) > > > at kafka.log.Log.maybeHandleIOException(Log.scala:1678) > > > at kafka.log.Log.deleteSegments(Log.scala:1161) > > > at kafka.log.Log.deleteOldSegments(Log.scala:1156) > > > at kafka.log.Log.deleteRetentionMsBreachedSegme > > nts(Log.scala:1228) > > > at kafka.log.Log.deleteOldSegments(Log.scala:1222) > > > at > > > kafka.log.LogManager$$anonfun$cleanupLogs$3.apply( > LogManager.scala:854) > > > at > > > kafka.log.LogManager$$anonfun$cleanupLogs$3.apply( > LogManager.scala:852) > > > at > > > scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply( > > TraversableLike.scala:733) > > > at scala.collection.immutable.List.foreach(List.scala:392) > > > at > > > scala.collection.TraversableLike$WithFilter. > > foreach(TraversableLike.scala:732) > > > at kafka.log.LogManager.cleanupLogs(LogManager.scala:852) > > > at > > > kafka.log.LogManager$$anonfun$startup$1.apply$mcV$sp( > > LogManager.scala:385) > > > at > > > kafka.utils.KafkaScheduler$$anonfun$1.apply$mcV$sp( > > KafkaScheduler.scala:110) > > > at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:62) > > > at > > > java.util.concurrent.Executors$RunnableAdapter. > call(Executors.java:511) > > > at java.util.concurrent.FutureTask.runAndReset( > > FutureTask.java:308) > > > at > > > java.util.concurrent.ScheduledThreadPoolExecutor$ > > ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > > > at > > > java.util.concurrent.ScheduledThreadPoolExecutor$ > > ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > > > at > > > java.util.concurrent.ThreadPoolExecutor.runWorker( > > ThreadPoolExecutor.java:1149) > > > at > > > java.util.concurrent.ThreadPoolExecutor$Worker.run( > > ThreadPoolExecutor.java:624) > > > at java.lang.Thread.run(Thread.java:748) > > > Suppressed: java.nio.file.FileSystemException: > > > C:\kafka1\test-0\00000000000000000000.log -> > > > C:\kafka1\test-0\00000000000000000000.log.deleted: The process cannot > > > access the file because it is being used by another process. > > > > > > at > > > sun.nio.fs.WindowsException.translateToIOException( > > WindowsException.java:86) > > > at > > > sun.nio.fs.WindowsException.rethrowAsIOException( > > WindowsException.java:97) > > > at > > > sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:301) > > > at > > > sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider. > > java:287) > > > at java.nio.file.Files.move(Files.java:1395) > > > at > > > org.apache.kafka.common.utils.Utils.atomicMoveWithFallback( > > Utils.java:694) > > > ... 32 more > > > [2018-05-12 21:36:57,689] INFO [ReplicaManager broker=1] Stopping > serving > > > replicas in dir C:\kafka1 (kafka.server.ReplicaManager) > > > [2018-05-12 21:36:57,689] ERROR Uncaught exception in scheduled task > > > 'kafka-log-retention' (kafka.utils.KafkaScheduler) > > > org.apache.kafka.common.errors.KafkaStorageException: Error while > > deleting > > > segments for test-0 in dir C:\kafka1 > > > Caused by: java.nio.file.FileSystemException: > > > C:\kafka1\test-0\00000000000000000000.log -> > > > C:\kafka1\test-0\00000000000000000000.log.deleted: The process cannot > > > access the file because it is being used by another process. > > > > > > at > > > sun.nio.fs.WindowsException.translateToIOException( > > WindowsException.java:86) > > > at > > > sun.nio.fs.WindowsException.rethrowAsIOException( > > WindowsException.java:97) > > > at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:387) > > > at > > > sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider. > > java:287) > > > at java.nio.file.Files.move(Files.java:1395) > > > at > > > org.apache.kafka.common.utils.Utils.atomicMoveWithFallback( > > Utils.java:697) > > > at > > > org.apache.kafka.common.record.FileRecords.renameTo( > > FileRecords.java:212) > > > at kafka.log.LogSegment.changeFileSuffixes(LogSegment. > scala:415) > > > at kafka.log.Log.kafka$log$Log$$asyncDeleteSegment(Log.scala: > > 1601) > > > at kafka.log.Log.kafka$log$Log$$deleteSegment(Log.scala:1588) > > > at > > > kafka.log.Log$$anonfun$deleteSegments$1$$anonfun$ > > apply$mcI$sp$1.apply(Log.scala:1170) > > > at > > > kafka.log.Log$$anonfun$deleteSegments$1$$anonfun$ > > apply$mcI$sp$1.apply(Log.scala:1170) > > > at > > > scala.collection.mutable.ResizableArray$class.foreach( > > ResizableArray.scala:59) > > > at > > > scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > > > at > > > kafka.log.Log$$anonfun$deleteSegments$1.apply$mcI$sp(Log.scala:1170) > > > at kafka.log.Log$$anonfun$deleteSegments$1.apply(Log. > scala:1161) > > > at kafka.log.Log$$anonfun$deleteSegments$1.apply(Log. > scala:1161) > > > at kafka.log.Log.maybeHandleIOException(Log.scala:1678) > > > at kafka.log.Log.deleteSegments(Log.scala:1161) > > > at kafka.log.Log.deleteOldSegments(Log.scala:1156) > > > at kafka.log.Log.deleteRetentionMsBreachedSegme > > nts(Log.scala:1228) > > > at kafka.log.Log.deleteOldSegments(Log.scala:1222) > > > at > > > kafka.log.LogManager$$anonfun$cleanupLogs$3.apply( > LogManager.scala:854) > > > at > > > kafka.log.LogManager$$anonfun$cleanupLogs$3.apply( > LogManager.scala:852) > > > at > > > scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply( > > TraversableLike.scala:733) > > > at scala.collection.immutable.List.foreach(List.scala:392) > > > at > > > scala.collection.TraversableLike$WithFilter. > > foreach(TraversableLike.scala:732) > > > at kafka.log.LogManager.cleanupLogs(LogManager.scala:852) > > > at > > > kafka.log.LogManager$$anonfun$startup$1.apply$mcV$sp( > > LogManager.scala:385) > > > at > > > kafka.utils.KafkaScheduler$$anonfun$1.apply$mcV$sp( > > KafkaScheduler.scala:110) > > > at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:62) > > > at > > > java.util.concurrent.Executors$RunnableAdapter. > call(Executors.java:511) > > > at java.util.concurrent.FutureTask.runAndReset( > > FutureTask.java:308) > > > at > > > java.util.concurrent.ScheduledThreadPoolExecutor$ > > ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > > > at > > > java.util.concurrent.ScheduledThreadPoolExecutor$ > > ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > > > at > > > java.util.concurrent.ThreadPoolExecutor.runWorker( > > ThreadPoolExecutor.java:1149) > > > at > > > java.util.concurrent.ThreadPoolExecutor$Worker.run( > > ThreadPoolExecutor.java:624) > > > at java.lang.Thread.run(Thread.java:748) > > > Suppressed: java.nio.file.FileSystemException: > > > C:\kafka1\test-0\00000000000000000000.log -> > > > C:\kafka1\test-0\00000000000000000000.log.deleted: The process cannot > > > access the file because it is being used by another process. > > > > > > at > > > sun.nio.fs.WindowsException.translateToIOException( > > WindowsException.java:86) > > > at > > > sun.nio.fs.WindowsException.rethrowAsIOException( > > WindowsException.java:97) > > > at > > > sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:301) > > > at > > > sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider. > > java:287) > > > at java.nio.file.Files.move(Files.java:1395) > > > at > > > org.apache.kafka.common.utils.Utils.atomicMoveWithFallback( > > Utils.java:694) > > > ... 32 more > > > [2018-05-12 21:36:57,689] INFO [ReplicaFetcherManager on broker 1] > > Removed > > > fetcher for partitions > > > __consumer_offsets-22,__consumer_offsets-30,__ > > consumer_offsets-8,__consumer_offsets-21,__consumer_offsets- > > 4,__consumer_offsets-27,__consumer_offsets-7,__consumer_ > > offsets-9,__consumer_offsets-46,__consumer_offsets-25,__ > > consumer_offsets-35,__consumer_offsets-41,__consumer_offsets-33,__ > > consumer_offsets-23,__consumer_offsets-49,__consumer_offsets-47,__ > > consumer_offsets-16,test-0,__consumer_offsets-28,__ > consumer_offsets-31,__ > > consumer_offsets-36,__consumer_offsets-42,__ > consumer_offsets-3,__consumer_ > > offsets-18,test1-0,__consumer_offsets-37,__consumer_offsets- > > 15,__consumer_offsets-24,__consumer_offsets-38,__consumer_offsets-17,__ > > consumer_offsets-48,__consumer_offsets-19,__consumer_offsets-11,__ > > consumer_offsets-13,__consumer_offsets-2,__consumer_ > > offsets-43,__consumer_offsets-6,__consumer_offsets-14,__ > > consumer_offsets-20,__consumer_offsets-0,__consumer_ > > offsets-44,__consumer_offsets-39,__consumer_offsets-12,__ > > consumer_offsets-45,__consumer_offsets-1,__consumer_ > > offsets-5,__consumer_offsets-26,__consumer_offsets-29,__ > > consumer_offsets-34,__consumer_offsets-10,__consumer_offsets-32,__ > > consumer_offsets-40 > > > (kafka.server.ReplicaFetcherManager) > > > [2018-05-12 21:36:57,689] INFO [ReplicaAlterLogDirsManager on broker 1] > > > Removed fetcher for partitions > > > __consumer_offsets-22,__consumer_offsets-30,__ > > consumer_offsets-8,__consumer_offsets-21,__consumer_offsets- > > 4,__consumer_offsets-27,__consumer_offsets-7,__consumer_ > > offsets-9,__consumer_offsets-46,__consumer_offsets-25,__ > > consumer_offsets-35,__consumer_offsets-41,__consumer_offsets-33,__ > > consumer_offsets-23,__consumer_offsets-49,__consumer_offsets-47,__ > > consumer_offsets-16,test-0,__consumer_offsets-28,__ > consumer_offsets-31,__ > > consumer_offsets-36,__consumer_offsets-42,__ > consumer_offsets-3,__consumer_ > > offsets-18,test1-0,__consumer_offsets-37,__consumer_offsets- > > 15,__consumer_offsets-24,__consumer_offsets-38,__consumer_offsets-17,__ > > consumer_offsets-48,__consumer_offsets-19,__consumer_offsets-11,__ > > consumer_offsets-13,__consumer_offsets-2,__consumer_ > > offsets-43,__consumer_offsets-6,__consumer_offsets-14,__ > > consumer_offsets-20,__consumer_offsets-0,__consumer_ > > offsets-44,__consumer_offsets-39,__consumer_offsets-12,__ > > consumer_offsets-45,__consumer_offsets-1,__consumer_ > > offsets-5,__consumer_offsets-26,__consumer_offsets-29,__ > > consumer_offsets-34,__consumer_offsets-10,__consumer_offsets-32,__ > > consumer_offsets-40 > > > (kafka.server.ReplicaAlterLogDirsManager) > > > [2018-05-12 21:36:57,751] INFO [ReplicaManager broker=1] Broker 1 > stopped > > > fetcher for partitions > > > __consumer_offsets-22,__consumer_offsets-30,__ > > consumer_offsets-8,__consumer_offsets-21,__consumer_offsets- > > 4,__consumer_offsets-27,__consumer_offsets-7,__consumer_ > > offsets-9,__consumer_offsets-46,__consumer_offsets-25,__ > > consumer_offsets-35,__consumer_offsets-41,__consumer_offsets-33,__ > > consumer_offsets-23,__consumer_offsets-49,__consumer_offsets-47,__ > > consumer_offsets-16,test-0,__consumer_offsets-28,__ > consumer_offsets-31,__ > > consumer_offsets-36,__consumer_offsets-42,__ > consumer_offsets-3,__consumer_ > > offsets-18,test1-0,__consumer_offsets-37,__consumer_offsets- > > 15,__consumer_offsets-24,__consumer_offsets-38,__consumer_offsets-17,__ > > consumer_offsets-48,__consumer_offsets-19,__consumer_offsets-11,__ > > consumer_offsets-13,__consumer_offsets-2,__consumer_ > > offsets-43,__consumer_offsets-6,__consumer_offsets-14,__ > > consumer_offsets-20,__consumer_offsets-0,__consumer_ > > offsets-44,__consumer_offsets-39,__consumer_offsets-12,__ > > consumer_offsets-45,__consumer_offsets-1,__consumer_ > > offsets-5,__consumer_offsets-26,__consumer_offsets-29,__ > > consumer_offsets-34,__consumer_offsets-10,__consumer_offsets-32,__ > > consumer_offsets-40 > > > and stopped moving logs for partitions because they are in the failed > > log > > > directory C:\kafka1. (kafka.server.ReplicaManager) > > > [2018-05-12 21:36:57,751] INFO Stopping serving logs in dir C:\kafka1 > > > (kafka.log.LogManager) > > > [2018-05-12 21:36:57,767] ERROR Shutdown broker because all log dirs in > > > C:\kafka1 have failed (kafka.log.LogManager) > > > > > > > Could someone please share some ideas how to rectify this on Windows? If > > this will never be supported on Windows, could we get some official > > communication perhaps? > > > > Regards, > > >