Looks like you were trying to show some screen which didn't go thru. Examining LogSegment.scala, I wonder if the time index was being appended to (which caused the deletion to fail):
// append an entry to the index (if needed) if(bytesSinceLastIndexEntry > indexIntervalBytes) { offsetIndex.append(largestOffset, physicalPosition) timeIndex.maybeAppend(maxTimestampSoFar, offsetOfMaxTimestamp) Can you capture the stack trace and pastebin it ? If LogSegment shows up in the stack trace, we may have some idea for the root cause. Thanks On Mon, May 14, 2018 at 8:31 AM, M. Manna <manme...@gmail.com> wrote: > I am having some difficulties debugging the cluster. IN CWIKI for kafka, > I can see a setup guide for eclipse, but nothing as such for debugger > setup. The issue is that every time the log cleaner thread is run, it's > having a FATAL shutdown. > I tried to close all producers and consumers, but it still locks the log > and Transaction index segments. Forcefully unmapping the offset and time > offset files work normally, but log and transaction index files don't work > properly. > > Could anyone please advise how to setup the kafka debugging on eclipse ? I > have added the sdb jar noted in scala-debugger.org site. But I couldn't > connect to port 5005 at all. > > > > > On 13 May 2018 at 10:00, M. Manna <manme...@gmail.com> wrote: > >> Hi Ted, >> >> I highly appreciate the response over the weekend, and thanks for >> pointing out the JIRAs. >> >> I don't believe the processes are responsible, but individual threads >> which are still holding the log/index files using IO streams. I am trying >> walk a single node setup through debugger to find out which thread is >> locking the file. Apologise but it's a huge application so it might get me >> some time to get around it :) >> >> Please do update if you find something new. >> >> Regards, >> >> On 12 May 2018 at 22:15, Ted Yu <yuzhih...@gmail.com> wrote: >> >>> There are two outstanding issues: KAFKA-6059 and KAFKA-6200 which bear >>> some >>> resemblance. >>> >>> Can you try to find how the other process uses the file being deleted ? >>> >>> https://superuser.com/questions/117902/find-out-which-proces >>> s-is-locking-a-file-or-folder-in-windows >>> https://www.computerhope.com/issues/ch000714.htm >>> >>> Cheers >>> >>> On Sat, May 12, 2018 at 1:42 PM, M. Manna <manme...@gmail.com> wrote: >>> >>> > Hello, >>> > >>> > We are still stuck with this issue where 2.11-1.1.0 distro is failing >>> to >>> > cleanup logs on Windows and brings the entire cluster down one by one. >>> > Extending the retention hours and sizes don't help because they burden >>> the >>> > hard drive. >>> > >>> > Here is the log >>> > >>> > [2018-05-12 21:36:57,673] INFO [Log partition=test-0, dir=C:\kafka1] >>> Rolled >>> > > new log segment at offset 45 in 105 ms. (kafka.log.Log) >>> > > [2018-05-12 21:36:57,673] INFO [Log partition=test-0, dir=C:\kafka1] >>> > > Scheduling log segment [baseOffset 0, size 2290] for deletion. >>> > > (kafka.log.Log) >>> > > [2018-05-12 21:36:57,673] ERROR Error while deleting segments for >>> test-0 >>> > > in dir C:\kafka1 (kafka.server.LogDirFailureChannel) >>> > > java.nio.file.FileSystemException: >>> > > C:\kafka1\test-0\00000000000000000000.log -> >>> > > C:\kafka1\test-0\00000000000000000000.log.deleted: The process >>> cannot >>> > > access the file because it is being used by another process. >>> > > >>> > > at >>> > > sun.nio.fs.WindowsException.translateToIOException( >>> > WindowsException.java:86) >>> > > at >>> > > sun.nio.fs.WindowsException.rethrowAsIOException( >>> > WindowsException.java:97) >>> > > at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:387) >>> > > at >>> > > sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider. >>> > java:287) >>> > > at java.nio.file.Files.move(Files.java:1395) >>> > > at >>> > > org.apache.kafka.common.utils.Utils.atomicMoveWithFallback( >>> > Utils.java:697) >>> > > at >>> > > org.apache.kafka.common.record.FileRecords.renameTo( >>> > FileRecords.java:212) >>> > > at kafka.log.LogSegment.changeFil >>> eSuffixes(LogSegment.scala:415) >>> > > at kafka.log.Log.kafka$log$Log$$a >>> syncDeleteSegment(Log.scala: >>> > 1601) >>> > > at kafka.log.Log.kafka$log$Log$$d >>> eleteSegment(Log.scala:1588) >>> > > at >>> > > kafka.log.Log$$anonfun$deleteSegments$1$$anonfun$ >>> > apply$mcI$sp$1.apply(Log.scala:1170) >>> > > at >>> > > kafka.log.Log$$anonfun$deleteSegments$1$$anonfun$ >>> > apply$mcI$sp$1.apply(Log.scala:1170) >>> > > at >>> > > scala.collection.mutable.ResizableArray$class.foreach( >>> > ResizableArray.scala:59) >>> > > at >>> > > scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) >>> > > at >>> > > kafka.log.Log$$anonfun$deleteSegments$1.apply$mcI$sp(Log.scala:1170) >>> > > at kafka.log.Log$$anonfun$deleteS >>> egments$1.apply(Log.scala:1161) >>> > > at kafka.log.Log$$anonfun$deleteS >>> egments$1.apply(Log.scala:1161) >>> > > at kafka.log.Log.maybeHandleIOException(Log.scala:1678) >>> > > at kafka.log.Log.deleteSegments(Log.scala:1161) >>> > > at kafka.log.Log.deleteOldSegments(Log.scala:1156) >>> > > at kafka.log.Log.deleteRetentionMsBreachedSegme >>> > nts(Log.scala:1228) >>> > > at kafka.log.Log.deleteOldSegments(Log.scala:1222) >>> > > at >>> > > kafka.log.LogManager$$anonfun$cleanupLogs$3.apply(LogManager >>> .scala:854) >>> > > at >>> > > kafka.log.LogManager$$anonfun$cleanupLogs$3.apply(LogManager >>> .scala:852) >>> > > at >>> > > scala.collection.TraversableLike$WithFilter$$anonfun$foreach >>> $1.apply( >>> > TraversableLike.scala:733) >>> > > at scala.collection.immutable.List.foreach(List.scala:392) >>> > > at >>> > > scala.collection.TraversableLike$WithFilter. >>> > foreach(TraversableLike.scala:732) >>> > > at kafka.log.LogManager.cleanupLogs(LogManager.scala:852) >>> > > at >>> > > kafka.log.LogManager$$anonfun$startup$1.apply$mcV$sp( >>> > LogManager.scala:385) >>> > > at >>> > > kafka.utils.KafkaScheduler$$anonfun$1.apply$mcV$sp( >>> > KafkaScheduler.scala:110) >>> > > at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:62) >>> > > at >>> > > java.util.concurrent.Executors$RunnableAdapter.call(Executor >>> s.java:511) >>> >>> > > at java.util.concurrent.FutureTask.runAndReset( >>> > FutureTask.java:308) >>> > > at >>> > > java.util.concurrent.ScheduledThreadPoolExecutor$ >>> > ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) >>> > > at >>> > > java.util.concurrent.ScheduledThreadPoolExecutor$ >>> > ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) >>> > > at >>> > > java.util.concurrent.ThreadPoolExecutor.runWorker( >>> > ThreadPoolExecutor.java:1149) >>> > > at >>> > > java.util.concurrent.ThreadPoolExecutor$Worker.run( >>> > ThreadPoolExecutor.java:624) >>> > > at java.lang.Thread.run(Thread.java:748) >>> > > Suppressed: java.nio.file.FileSystemException: >>> > > C:\kafka1\test-0\00000000000000000000.log -> >>> > > C:\kafka1\test-0\00000000000000000000.log.deleted: The process >>> cannot >>> > > access the file because it is being used by another process. >>> > > >>> > > at >>> > > sun.nio.fs.WindowsException.translateToIOException( >>> > WindowsException.java:86) >>> > > at >>> > > sun.nio.fs.WindowsException.rethrowAsIOException( >>> > WindowsException.java:97) >>> > > at >>> > > sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:301) >>> > > at >>> > > sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider. >>> > java:287) >>> > > at java.nio.file.Files.move(Files.java:1395) >>> > > at >>> > > org.apache.kafka.common.utils.Utils.atomicMoveWithFallback( >>> > Utils.java:694) >>> > > ... 32 more >>> > > [2018-05-12 21:36:57,689] INFO [ReplicaManager broker=1] Stopping >>> serving >>> > > replicas in dir C:\kafka1 (kafka.server.ReplicaManager) >>> > > [2018-05-12 21:36:57,689] ERROR Uncaught exception in scheduled task >>> > > 'kafka-log-retention' (kafka.utils.KafkaScheduler) >>> > > org.apache.kafka.common.errors.KafkaStorageException: Error while >>> > deleting >>> > > segments for test-0 in dir C:\kafka1 >>> > > Caused by: java.nio.file.FileSystemException: >>> > > C:\kafka1\test-0\00000000000000000000.log -> >>> > > C:\kafka1\test-0\00000000000000000000.log.deleted: The process >>> cannot >>> > > access the file because it is being used by another process. >>> > > >>> > > at >>> > > sun.nio.fs.WindowsException.translateToIOException( >>> > WindowsException.java:86) >>> > > at >>> > > sun.nio.fs.WindowsException.rethrowAsIOException( >>> > WindowsException.java:97) >>> > > at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:387) >>> > > at >>> > > sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider. >>> > java:287) >>> > > at java.nio.file.Files.move(Files.java:1395) >>> > > at >>> > > org.apache.kafka.common.utils.Utils.atomicMoveWithFallback( >>> > Utils.java:697) >>> > > at >>> > > org.apache.kafka.common.record.FileRecords.renameTo( >>> > FileRecords.java:212) >>> > > at kafka.log.LogSegment.changeFil >>> eSuffixes(LogSegment.scala:415) >>> > > at kafka.log.Log.kafka$log$Log$$a >>> syncDeleteSegment(Log.scala: >>> > 1601) >>> > > at kafka.log.Log.kafka$log$Log$$d >>> eleteSegment(Log.scala:1588) >>> > > at >>> > > kafka.log.Log$$anonfun$deleteSegments$1$$anonfun$ >>> > apply$mcI$sp$1.apply(Log.scala:1170) >>> > > at >>> > > kafka.log.Log$$anonfun$deleteSegments$1$$anonfun$ >>> > apply$mcI$sp$1.apply(Log.scala:1170) >>> > > at >>> > > scala.collection.mutable.ResizableArray$class.foreach( >>> > ResizableArray.scala:59) >>> > > at >>> > > scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) >>> > > at >>> > > kafka.log.Log$$anonfun$deleteSegments$1.apply$mcI$sp(Log.scala:1170) >>> > > at kafka.log.Log$$anonfun$deleteS >>> egments$1.apply(Log.scala:1161) >>> > > at kafka.log.Log$$anonfun$deleteS >>> egments$1.apply(Log.scala:1161) >>> > > at kafka.log.Log.maybeHandleIOException(Log.scala:1678) >>> > > at kafka.log.Log.deleteSegments(Log.scala:1161) >>> > > at kafka.log.Log.deleteOldSegments(Log.scala:1156) >>> > > at kafka.log.Log.deleteRetentionMsBreachedSegme >>> > nts(Log.scala:1228) >>> > > at kafka.log.Log.deleteOldSegments(Log.scala:1222) >>> > > at >>> > > kafka.log.LogManager$$anonfun$cleanupLogs$3.apply(LogManager >>> .scala:854) >>> > > at >>> > > kafka.log.LogManager$$anonfun$cleanupLogs$3.apply(LogManager >>> .scala:852) >>> > > at >>> > > scala.collection.TraversableLike$WithFilter$$anonfun$foreach >>> $1.apply( >>> > TraversableLike.scala:733) >>> > > at scala.collection.immutable.List.foreach(List.scala:392) >>> > > at >>> > > scala.collection.TraversableLike$WithFilter. >>> > foreach(TraversableLike.scala:732) >>> > > at kafka.log.LogManager.cleanupLogs(LogManager.scala:852) >>> > > at >>> > > kafka.log.LogManager$$anonfun$startup$1.apply$mcV$sp( >>> > LogManager.scala:385) >>> > > at >>> > > kafka.utils.KafkaScheduler$$anonfun$1.apply$mcV$sp( >>> > KafkaScheduler.scala:110) >>> > > at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:62) >>> > > at >>> > > java.util.concurrent.Executors$RunnableAdapter.call(Executor >>> s.java:511) >>> >>> > > at java.util.concurrent.FutureTask.runAndReset( >>> > FutureTask.java:308) >>> > > at >>> > > java.util.concurrent.ScheduledThreadPoolExecutor$ >>> > ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) >>> > > at >>> > > java.util.concurrent.ScheduledThreadPoolExecutor$ >>> > ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) >>> > > at >>> > > java.util.concurrent.ThreadPoolExecutor.runWorker( >>> > ThreadPoolExecutor.java:1149) >>> > > at >>> > > java.util.concurrent.ThreadPoolExecutor$Worker.run( >>> > ThreadPoolExecutor.java:624) >>> > > at java.lang.Thread.run(Thread.java:748) >>> > > Suppressed: java.nio.file.FileSystemException: >>> > > C:\kafka1\test-0\00000000000000000000.log -> >>> > > C:\kafka1\test-0\00000000000000000000.log.deleted: The process >>> cannot >>> > > access the file because it is being used by another process. >>> > > >>> > > at >>> > > sun.nio.fs.WindowsException.translateToIOException( >>> > WindowsException.java:86) >>> > > at >>> > > sun.nio.fs.WindowsException.rethrowAsIOException( >>> > WindowsException.java:97) >>> > > at >>> > > sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:301) >>> > > at >>> > > sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider. >>> > java:287) >>> > > at java.nio.file.Files.move(Files.java:1395) >>> > > at >>> > > org.apache.kafka.common.utils.Utils.atomicMoveWithFallback( >>> > Utils.java:694) >>> > > ... 32 more >>> > > [2018-05-12 21:36:57,689] INFO [ReplicaFetcherManager on broker 1] >>> > Removed >>> > > fetcher for partitions >>> > > __consumer_offsets-22,__consumer_offsets-30,__ >>> > consumer_offsets-8,__consumer_offsets-21,__consumer_offsets- >>> > 4,__consumer_offsets-27,__consumer_offsets-7,__consumer_ >>> > offsets-9,__consumer_offsets-46,__consumer_offsets-25,__ >>> > consumer_offsets-35,__consumer_offsets-41,__consumer_offsets-33,__ >>> > consumer_offsets-23,__consumer_offsets-49,__consumer_offsets-47,__ >>> > consumer_offsets-16,test-0,__consumer_offsets-28,__consumer_ >>> offsets-31,__ >>> > consumer_offsets-36,__consumer_offsets-42,__consumer_offsets >>> -3,__consumer_ >>> > offsets-18,test1-0,__consumer_offsets-37,__consumer_offsets- >>> > 15,__consumer_offsets-24,__consumer_offsets-38,__consumer_of >>> fsets-17,__ >>> > consumer_offsets-48,__consumer_offsets-19,__consumer_offsets-11,__ >>> > consumer_offsets-13,__consumer_offsets-2,__consumer_ >>> > offsets-43,__consumer_offsets-6,__consumer_offsets-14,__ >>> > consumer_offsets-20,__consumer_offsets-0,__consumer_ >>> > offsets-44,__consumer_offsets-39,__consumer_offsets-12,__ >>> > consumer_offsets-45,__consumer_offsets-1,__consumer_ >>> > offsets-5,__consumer_offsets-26,__consumer_offsets-29,__ >>> > consumer_offsets-34,__consumer_offsets-10,__consumer_offsets-32,__ >>> > consumer_offsets-40 >>> > > (kafka.server.ReplicaFetcherManager) >>> > > [2018-05-12 21:36:57,689] INFO [ReplicaAlterLogDirsManager on broker >>> 1] >>> > > Removed fetcher for partitions >>> > > __consumer_offsets-22,__consumer_offsets-30,__ >>> > consumer_offsets-8,__consumer_offsets-21,__consumer_offsets- >>> > 4,__consumer_offsets-27,__consumer_offsets-7,__consumer_ >>> > offsets-9,__consumer_offsets-46,__consumer_offsets-25,__ >>> > consumer_offsets-35,__consumer_offsets-41,__consumer_offsets-33,__ >>> > consumer_offsets-23,__consumer_offsets-49,__consumer_offsets-47,__ >>> > consumer_offsets-16,test-0,__consumer_offsets-28,__consumer_ >>> offsets-31,__ >>> > consumer_offsets-36,__consumer_offsets-42,__consumer_offsets >>> -3,__consumer_ >>> > offsets-18,test1-0,__consumer_offsets-37,__consumer_offsets- >>> > 15,__consumer_offsets-24,__consumer_offsets-38,__consumer_of >>> fsets-17,__ >>> > consumer_offsets-48,__consumer_offsets-19,__consumer_offsets-11,__ >>> > consumer_offsets-13,__consumer_offsets-2,__consumer_ >>> > offsets-43,__consumer_offsets-6,__consumer_offsets-14,__ >>> > consumer_offsets-20,__consumer_offsets-0,__consumer_ >>> > offsets-44,__consumer_offsets-39,__consumer_offsets-12,__ >>> > consumer_offsets-45,__consumer_offsets-1,__consumer_ >>> > offsets-5,__consumer_offsets-26,__consumer_offsets-29,__ >>> > consumer_offsets-34,__consumer_offsets-10,__consumer_offsets-32,__ >>> > consumer_offsets-40 >>> > > (kafka.server.ReplicaAlterLogDirsManager) >>> > > [2018-05-12 21:36:57,751] INFO [ReplicaManager broker=1] Broker 1 >>> stopped >>> > > fetcher for partitions >>> > > __consumer_offsets-22,__consumer_offsets-30,__ >>> > consumer_offsets-8,__consumer_offsets-21,__consumer_offsets- >>> > 4,__consumer_offsets-27,__consumer_offsets-7,__consumer_ >>> > offsets-9,__consumer_offsets-46,__consumer_offsets-25,__ >>> > consumer_offsets-35,__consumer_offsets-41,__consumer_offsets-33,__ >>> > consumer_offsets-23,__consumer_offsets-49,__consumer_offsets-47,__ >>> > consumer_offsets-16,test-0,__consumer_offsets-28,__consumer_ >>> offsets-31,__ >>> > consumer_offsets-36,__consumer_offsets-42,__consumer_offsets >>> -3,__consumer_ >>> > offsets-18,test1-0,__consumer_offsets-37,__consumer_offsets- >>> > 15,__consumer_offsets-24,__consumer_offsets-38,__consumer_of >>> fsets-17,__ >>> > consumer_offsets-48,__consumer_offsets-19,__consumer_offsets-11,__ >>> > consumer_offsets-13,__consumer_offsets-2,__consumer_ >>> > offsets-43,__consumer_offsets-6,__consumer_offsets-14,__ >>> > consumer_offsets-20,__consumer_offsets-0,__consumer_ >>> > offsets-44,__consumer_offsets-39,__consumer_offsets-12,__ >>> > consumer_offsets-45,__consumer_offsets-1,__consumer_ >>> > offsets-5,__consumer_offsets-26,__consumer_offsets-29,__ >>> > consumer_offsets-34,__consumer_offsets-10,__consumer_offsets-32,__ >>> > consumer_offsets-40 >>> > > and stopped moving logs for partitions because they are in the >>> failed >>> > log >>> > > directory C:\kafka1. (kafka.server.ReplicaManager) >>> > > [2018-05-12 21:36:57,751] INFO Stopping serving logs in dir C:\kafka1 >>> > > (kafka.log.LogManager) >>> > > [2018-05-12 21:36:57,767] ERROR Shutdown broker because all log dirs >>> in >>> > > C:\kafka1 have failed (kafka.log.LogManager) >>> > > >>> > >>> > Could someone please share some ideas how to rectify this on Windows? >>> If >>> > this will never be supported on Windows, could we get some official >>> > communication perhaps? >>> > >>> > Regards, >>> > >>> >> >> >