There are two outstanding issues: KAFKA-6059 and KAFKA-6200 which bear some
resemblance.

Can you try to find how the other process uses the file being deleted ?

https://superuser.com/questions/117902/find-out-which-process-is-locking-a-file-or-folder-in-windows
https://www.computerhope.com/issues/ch000714.htm

Cheers

On Sat, May 12, 2018 at 1:42 PM, M. Manna <manme...@gmail.com> wrote:

> Hello,
>
> We are still stuck with this issue where 2.11-1.1.0 distro is failing to
> cleanup logs on Windows and brings the entire cluster down one by one.
> Extending the retention hours and sizes don't help because they burden the
> hard drive.
>
> Here is the log
>
> [2018-05-12 21:36:57,673] INFO [Log partition=test-0, dir=C:\kafka1] Rolled
> > new log segment at offset 45 in 105 ms. (kafka.log.Log)
> > [2018-05-12 21:36:57,673] INFO [Log partition=test-0, dir=C:\kafka1]
> > Scheduling log segment [baseOffset 0, size 2290] for deletion.
> > (kafka.log.Log)
> > [2018-05-12 21:36:57,673] ERROR Error while deleting segments for test-0
> > in dir C:\kafka1 (kafka.server.LogDirFailureChannel)
> > java.nio.file.FileSystemException:
> > C:\kafka1\test-0\00000000000000000000.log ->
> > C:\kafka1\test-0\00000000000000000000.log.deleted: The process cannot
> > access the file because it is being used by another process.
> >
> >         at
> > sun.nio.fs.WindowsException.translateToIOException(
> WindowsException.java:86)
> >         at
> > sun.nio.fs.WindowsException.rethrowAsIOException(
> WindowsException.java:97)
> >         at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:387)
> >         at
> > sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.
> java:287)
> >         at java.nio.file.Files.move(Files.java:1395)
> >         at
> > org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(
> Utils.java:697)
> >         at
> > org.apache.kafka.common.record.FileRecords.renameTo(
> FileRecords.java:212)
> >         at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:415)
> >         at kafka.log.Log.kafka$log$Log$$asyncDeleteSegment(Log.scala:
> 1601)
> >         at kafka.log.Log.kafka$log$Log$$deleteSegment(Log.scala:1588)
> >         at
> > kafka.log.Log$$anonfun$deleteSegments$1$$anonfun$
> apply$mcI$sp$1.apply(Log.scala:1170)
> >         at
> > kafka.log.Log$$anonfun$deleteSegments$1$$anonfun$
> apply$mcI$sp$1.apply(Log.scala:1170)
> >         at
> > scala.collection.mutable.ResizableArray$class.foreach(
> ResizableArray.scala:59)
> >         at
> > scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
> >         at
> > kafka.log.Log$$anonfun$deleteSegments$1.apply$mcI$sp(Log.scala:1170)
> >         at kafka.log.Log$$anonfun$deleteSegments$1.apply(Log.scala:1161)
> >         at kafka.log.Log$$anonfun$deleteSegments$1.apply(Log.scala:1161)
> >         at kafka.log.Log.maybeHandleIOException(Log.scala:1678)
> >         at kafka.log.Log.deleteSegments(Log.scala:1161)
> >         at kafka.log.Log.deleteOldSegments(Log.scala:1156)
> >         at kafka.log.Log.deleteRetentionMsBreachedSegme
> nts(Log.scala:1228)
> >         at kafka.log.Log.deleteOldSegments(Log.scala:1222)
> >         at
> > kafka.log.LogManager$$anonfun$cleanupLogs$3.apply(LogManager.scala:854)
> >         at
> > kafka.log.LogManager$$anonfun$cleanupLogs$3.apply(LogManager.scala:852)
> >         at
> > scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(
> TraversableLike.scala:733)
> >         at scala.collection.immutable.List.foreach(List.scala:392)
> >         at
> > scala.collection.TraversableLike$WithFilter.
> foreach(TraversableLike.scala:732)
> >         at kafka.log.LogManager.cleanupLogs(LogManager.scala:852)
> >         at
> > kafka.log.LogManager$$anonfun$startup$1.apply$mcV$sp(
> LogManager.scala:385)
> >         at
> > kafka.utils.KafkaScheduler$$anonfun$1.apply$mcV$sp(
> KafkaScheduler.scala:110)
> >         at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:62)
> >         at
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> >         at java.util.concurrent.FutureTask.runAndReset(
> FutureTask.java:308)
> >         at
> > java.util.concurrent.ScheduledThreadPoolExecutor$
> ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> >         at
> > java.util.concurrent.ScheduledThreadPoolExecutor$
> ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> >         at
> > java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1149)
> >         at
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:624)
> >         at java.lang.Thread.run(Thread.java:748)
> >         Suppressed: java.nio.file.FileSystemException:
> > C:\kafka1\test-0\00000000000000000000.log ->
> > C:\kafka1\test-0\00000000000000000000.log.deleted: The process cannot
> > access the file because it is being used by another process.
> >
> >                 at
> > sun.nio.fs.WindowsException.translateToIOException(
> WindowsException.java:86)
> >                 at
> > sun.nio.fs.WindowsException.rethrowAsIOException(
> WindowsException.java:97)
> >                 at
> > sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:301)
> >                 at
> > sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.
> java:287)
> >                 at java.nio.file.Files.move(Files.java:1395)
> >                 at
> > org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(
> Utils.java:694)
> >                 ... 32 more
> > [2018-05-12 21:36:57,689] INFO [ReplicaManager broker=1] Stopping serving
> > replicas in dir C:\kafka1 (kafka.server.ReplicaManager)
> > [2018-05-12 21:36:57,689] ERROR Uncaught exception in scheduled task
> > 'kafka-log-retention' (kafka.utils.KafkaScheduler)
> > org.apache.kafka.common.errors.KafkaStorageException: Error while
> deleting
> > segments for test-0 in dir C:\kafka1
> > Caused by: java.nio.file.FileSystemException:
> > C:\kafka1\test-0\00000000000000000000.log ->
> > C:\kafka1\test-0\00000000000000000000.log.deleted: The process cannot
> > access the file because it is being used by another process.
> >
> >         at
> > sun.nio.fs.WindowsException.translateToIOException(
> WindowsException.java:86)
> >         at
> > sun.nio.fs.WindowsException.rethrowAsIOException(
> WindowsException.java:97)
> >         at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:387)
> >         at
> > sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.
> java:287)
> >         at java.nio.file.Files.move(Files.java:1395)
> >         at
> > org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(
> Utils.java:697)
> >         at
> > org.apache.kafka.common.record.FileRecords.renameTo(
> FileRecords.java:212)
> >         at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:415)
> >         at kafka.log.Log.kafka$log$Log$$asyncDeleteSegment(Log.scala:
> 1601)
> >         at kafka.log.Log.kafka$log$Log$$deleteSegment(Log.scala:1588)
> >         at
> > kafka.log.Log$$anonfun$deleteSegments$1$$anonfun$
> apply$mcI$sp$1.apply(Log.scala:1170)
> >         at
> > kafka.log.Log$$anonfun$deleteSegments$1$$anonfun$
> apply$mcI$sp$1.apply(Log.scala:1170)
> >         at
> > scala.collection.mutable.ResizableArray$class.foreach(
> ResizableArray.scala:59)
> >         at
> > scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
> >         at
> > kafka.log.Log$$anonfun$deleteSegments$1.apply$mcI$sp(Log.scala:1170)
> >         at kafka.log.Log$$anonfun$deleteSegments$1.apply(Log.scala:1161)
> >         at kafka.log.Log$$anonfun$deleteSegments$1.apply(Log.scala:1161)
> >         at kafka.log.Log.maybeHandleIOException(Log.scala:1678)
> >         at kafka.log.Log.deleteSegments(Log.scala:1161)
> >         at kafka.log.Log.deleteOldSegments(Log.scala:1156)
> >         at kafka.log.Log.deleteRetentionMsBreachedSegme
> nts(Log.scala:1228)
> >         at kafka.log.Log.deleteOldSegments(Log.scala:1222)
> >         at
> > kafka.log.LogManager$$anonfun$cleanupLogs$3.apply(LogManager.scala:854)
> >         at
> > kafka.log.LogManager$$anonfun$cleanupLogs$3.apply(LogManager.scala:852)
> >         at
> > scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(
> TraversableLike.scala:733)
> >         at scala.collection.immutable.List.foreach(List.scala:392)
> >         at
> > scala.collection.TraversableLike$WithFilter.
> foreach(TraversableLike.scala:732)
> >         at kafka.log.LogManager.cleanupLogs(LogManager.scala:852)
> >         at
> > kafka.log.LogManager$$anonfun$startup$1.apply$mcV$sp(
> LogManager.scala:385)
> >         at
> > kafka.utils.KafkaScheduler$$anonfun$1.apply$mcV$sp(
> KafkaScheduler.scala:110)
> >         at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:62)
> >         at
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> >         at java.util.concurrent.FutureTask.runAndReset(
> FutureTask.java:308)
> >         at
> > java.util.concurrent.ScheduledThreadPoolExecutor$
> ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> >         at
> > java.util.concurrent.ScheduledThreadPoolExecutor$
> ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> >         at
> > java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1149)
> >         at
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:624)
> >         at java.lang.Thread.run(Thread.java:748)
> >         Suppressed: java.nio.file.FileSystemException:
> > C:\kafka1\test-0\00000000000000000000.log ->
> > C:\kafka1\test-0\00000000000000000000.log.deleted: The process cannot
> > access the file because it is being used by another process.
> >
> >                 at
> > sun.nio.fs.WindowsException.translateToIOException(
> WindowsException.java:86)
> >                 at
> > sun.nio.fs.WindowsException.rethrowAsIOException(
> WindowsException.java:97)
> >                 at
> > sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:301)
> >                 at
> > sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.
> java:287)
> >                 at java.nio.file.Files.move(Files.java:1395)
> >                 at
> > org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(
> Utils.java:694)
> >                 ... 32 more
> > [2018-05-12 21:36:57,689] INFO [ReplicaFetcherManager on broker 1]
> Removed
> > fetcher for partitions
> > __consumer_offsets-22,__consumer_offsets-30,__
> consumer_offsets-8,__consumer_offsets-21,__consumer_offsets-
> 4,__consumer_offsets-27,__consumer_offsets-7,__consumer_
> offsets-9,__consumer_offsets-46,__consumer_offsets-25,__
> consumer_offsets-35,__consumer_offsets-41,__consumer_offsets-33,__
> consumer_offsets-23,__consumer_offsets-49,__consumer_offsets-47,__
> consumer_offsets-16,test-0,__consumer_offsets-28,__consumer_offsets-31,__
> consumer_offsets-36,__consumer_offsets-42,__consumer_offsets-3,__consumer_
> offsets-18,test1-0,__consumer_offsets-37,__consumer_offsets-
> 15,__consumer_offsets-24,__consumer_offsets-38,__consumer_offsets-17,__
> consumer_offsets-48,__consumer_offsets-19,__consumer_offsets-11,__
> consumer_offsets-13,__consumer_offsets-2,__consumer_
> offsets-43,__consumer_offsets-6,__consumer_offsets-14,__
> consumer_offsets-20,__consumer_offsets-0,__consumer_
> offsets-44,__consumer_offsets-39,__consumer_offsets-12,__
> consumer_offsets-45,__consumer_offsets-1,__consumer_
> offsets-5,__consumer_offsets-26,__consumer_offsets-29,__
> consumer_offsets-34,__consumer_offsets-10,__consumer_offsets-32,__
> consumer_offsets-40
> > (kafka.server.ReplicaFetcherManager)
> > [2018-05-12 21:36:57,689] INFO [ReplicaAlterLogDirsManager on broker 1]
> > Removed fetcher for partitions
> > __consumer_offsets-22,__consumer_offsets-30,__
> consumer_offsets-8,__consumer_offsets-21,__consumer_offsets-
> 4,__consumer_offsets-27,__consumer_offsets-7,__consumer_
> offsets-9,__consumer_offsets-46,__consumer_offsets-25,__
> consumer_offsets-35,__consumer_offsets-41,__consumer_offsets-33,__
> consumer_offsets-23,__consumer_offsets-49,__consumer_offsets-47,__
> consumer_offsets-16,test-0,__consumer_offsets-28,__consumer_offsets-31,__
> consumer_offsets-36,__consumer_offsets-42,__consumer_offsets-3,__consumer_
> offsets-18,test1-0,__consumer_offsets-37,__consumer_offsets-
> 15,__consumer_offsets-24,__consumer_offsets-38,__consumer_offsets-17,__
> consumer_offsets-48,__consumer_offsets-19,__consumer_offsets-11,__
> consumer_offsets-13,__consumer_offsets-2,__consumer_
> offsets-43,__consumer_offsets-6,__consumer_offsets-14,__
> consumer_offsets-20,__consumer_offsets-0,__consumer_
> offsets-44,__consumer_offsets-39,__consumer_offsets-12,__
> consumer_offsets-45,__consumer_offsets-1,__consumer_
> offsets-5,__consumer_offsets-26,__consumer_offsets-29,__
> consumer_offsets-34,__consumer_offsets-10,__consumer_offsets-32,__
> consumer_offsets-40
> > (kafka.server.ReplicaAlterLogDirsManager)
> > [2018-05-12 21:36:57,751] INFO [ReplicaManager broker=1] Broker 1 stopped
> > fetcher for partitions
> > __consumer_offsets-22,__consumer_offsets-30,__
> consumer_offsets-8,__consumer_offsets-21,__consumer_offsets-
> 4,__consumer_offsets-27,__consumer_offsets-7,__consumer_
> offsets-9,__consumer_offsets-46,__consumer_offsets-25,__
> consumer_offsets-35,__consumer_offsets-41,__consumer_offsets-33,__
> consumer_offsets-23,__consumer_offsets-49,__consumer_offsets-47,__
> consumer_offsets-16,test-0,__consumer_offsets-28,__consumer_offsets-31,__
> consumer_offsets-36,__consumer_offsets-42,__consumer_offsets-3,__consumer_
> offsets-18,test1-0,__consumer_offsets-37,__consumer_offsets-
> 15,__consumer_offsets-24,__consumer_offsets-38,__consumer_offsets-17,__
> consumer_offsets-48,__consumer_offsets-19,__consumer_offsets-11,__
> consumer_offsets-13,__consumer_offsets-2,__consumer_
> offsets-43,__consumer_offsets-6,__consumer_offsets-14,__
> consumer_offsets-20,__consumer_offsets-0,__consumer_
> offsets-44,__consumer_offsets-39,__consumer_offsets-12,__
> consumer_offsets-45,__consumer_offsets-1,__consumer_
> offsets-5,__consumer_offsets-26,__consumer_offsets-29,__
> consumer_offsets-34,__consumer_offsets-10,__consumer_offsets-32,__
> consumer_offsets-40
> > and stopped moving logs for partitions  because they are in the failed
> log
> > directory C:\kafka1. (kafka.server.ReplicaManager)
> > [2018-05-12 21:36:57,751] INFO Stopping serving logs in dir C:\kafka1
> > (kafka.log.LogManager)
> > [2018-05-12 21:36:57,767] ERROR Shutdown broker because all log dirs in
> > C:\kafka1 have failed (kafka.log.LogManager)
> >
>
> Could someone please share some ideas how to rectify this on Windows? If
> this will never be supported on Windows, could we get some official
> communication perhaps?
>
> Regards,
>

Reply via email to