[ https://issues.apache.org/jira/browse/KAFKA-903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jun Rao updated KAFKA-903: -------------------------- Attachment: kafka-903_v3.patch Attach patch v3. To address Jay's concern, instead of using a generic renameTo util, only falls back to the non-atomic renameTo in checkpointing the high watermark file. Since both files are in the same dir and we control the naming, those other causes you listed that can fail renameTo won't happen. I didn't do the os level checking since I am not sure it that works well for environments like cygwin. We could guard this under a broker config parameter, but I am not sure if it's worth it. For Sriram's concern, this seems to be at least a problem for some versions of java on Windows since other projects like Hadoop (https://issues.apache.org/jira/browse/HADOOP-959) have also seen this before. > [0.8.0 - windows] FATAL - [highwatermark-checkpoint-thread1] > (Logging.scala:109) - Attempt to swap the new high watermark file with the > old one failed > ------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: KAFKA-903 > URL: https://issues.apache.org/jira/browse/KAFKA-903 > Project: Kafka > Issue Type: Bug > Components: core > Affects Versions: 0.8 > Environment: Windows 7 with SP 1; jdk 7_0_17; scala-library-2.8.2, > probably copied on 4/30. kafka-0.8, built current on 4/30. > -rwx------+ 1 reefedjib None 41123 Mar 19 2009 commons-cli-1.2.jar > -rwx------+ 1 reefedjib None 58160 Jan 11 13:45 commons-codec-1.4.jar > -rwx------+ 1 reefedjib None 575389 Apr 18 13:41 > commons-collections-3.2.1.jar > -rwx------+ 1 reefedjib None 143847 May 21 2009 commons-compress-1.0.jar > -rwx------+ 1 reefedjib None 52543 Jan 11 13:45 commons-exec-1.1.jar > -rwx------+ 1 reefedjib None 57779 Jan 11 13:45 commons-fileupload-1.2.1.jar > -rwx------+ 1 reefedjib None 109043 Jan 20 2008 commons-io-1.4.jar > -rwx------+ 1 reefedjib None 279193 Jan 11 13:45 commons-lang-2.5.jar > -rwx------+ 1 reefedjib None 60686 Jan 11 13:45 commons-logging-1.1.1.jar > -rwx------+ 1 reefedjib None 1891110 Apr 18 13:41 guava-13.0.1.jar > -rwx------+ 1 reefedjib None 206866 Apr 7 21:24 jackson-core-2.1.4.jar > -rwx------+ 1 reefedjib None 232245 Apr 7 21:24 jackson-core-asl-1.9.12.jar > -rwx------+ 1 reefedjib None 69314 Apr 7 21:24 > jackson-dataformat-smile-2.1.4.jar > -rwx------+ 1 reefedjib None 780385 Apr 7 21:24 > jackson-mapper-asl-1.9.12.jar > -rwx------+ 1 reefedjib None 47913 May 9 23:39 jopt-simple-3.0-rc2.jar > -rwx------+ 1 reefedjib None 2365575 Apr 30 13:06 > kafka_2.8.0-0.8.0-SNAPSHOT.jar > -rwx------+ 1 reefedjib None 481535 Jan 11 13:46 log4j-1.2.16.jar > -rwx------+ 1 reefedjib None 20647 Apr 18 13:41 log4j-over-slf4j-1.6.6.jar > -rwx------+ 1 reefedjib None 251784 Apr 18 13:41 logback-classic-1.0.6.jar > -rwx------+ 1 reefedjib None 349706 Apr 18 13:41 logback-core-1.0.6.jar > -rwx------+ 1 reefedjib None 82123 Nov 26 13:11 metrics-core-2.2.0.jar > -rwx------+ 1 reefedjib None 1540457 Jul 12 2012 ojdbc14.jar > -rwx------+ 1 reefedjib None 6418368 Apr 30 08:23 scala-library-2.8.2.jar > -rwx------+ 1 reefedjib None 3114958 Apr 2 07:47 scalatest_2.10-1.9.1.jar > -rwx------+ 1 reefedjib None 25962 Apr 18 13:41 slf4j-api-1.6.5.jar > -rwx------+ 1 reefedjib None 62269 Nov 29 03:26 zkclient-0.2.jar > -rwx------+ 1 reefedjib None 601677 Apr 18 13:41 zookeeper-3.3.3.jar > Reporter: Rob Withers > Priority: Blocker > Attachments: kafka_2.8.0-0.8.0-SNAPSHOT.jar, kafka-903.patch, > kafka-903_v2.patch, kafka-903_v3.patch > > > This FATAL shuts down both brokers on windows, > {2013-05-10 18:23:57,636} DEBUG [local-vat] (Logging.scala:51) - Sending 1 > messages with no compression to [robert_v_2x0,0] > {2013-05-10 18:23:57,637} DEBUG [local-vat] (Logging.scala:51) - Producer > sending messages with correlation id 178 for topics [robert_v_2x0,0] to > broker 1 on 192.168.1.100:9093 > {2013-05-10 18:23:57,689} FATAL [highwatermark-checkpoint-thread1] > (Logging.scala:109) - Attempt to swap the new high watermark file with the > old one failed > {2013-05-10 18:23:57,739} INFO [Thread-4] (Logging.scala:67) - [Kafka > Server 0], shutting down > Furthermore, attempts to restart them fail, with the following log: > {2013-05-10 19:14:52,156} INFO [Thread-1] (Logging.scala:67) - [Kafka Server > 0], started > {2013-05-10 19:14:52,157} INFO [ZkClient-EventThread-32-localhost:2181] > (Logging.scala:67) - New leader is 0 > {2013-05-10 19:14:52,193} DEBUG [ZkClient-EventThread-32-localhost:2181] > (ZkEventThread.java:79) - Delivering event #1 done > {2013-05-10 19:14:52,193} DEBUG [ZkClient-EventThread-32-localhost:2181] > (ZkEventThread.java:69) - Delivering event #4 ZkEvent[Data of > /controller_epoch changed sent to > kafka.controller.ControllerEpochListener@5cb88f42] > {2013-05-10 19:14:52,210} DEBUG [SyncThread:0] > (FinalRequestProcessor.java:78) - Processing request:: > sessionid:0x13e9127882e0001 type:exists cxid:0x1d zxid:0xfffffffffffffffe > txntype:unknown reqpath:/controller_epoch > {2013-05-10 19:14:52,210} DEBUG [SyncThread:0] > (FinalRequestProcessor.java:160) - sessionid:0x13e9127882e0001 type:exists > cxid:0x1d zxid:0xfffffffffffffffe txntype:unknown reqpath:/controller_epoch > {2013-05-10 19:14:52,213} DEBUG [Thread-1-SendThread(localhost:2181)] > (ClientCnxn.java:838) - Reading reply sessionid:0x13e9127882e0001, packet:: > clientPath:null serverPath:null finished:false header:: 29,3 replyHeader:: > 29,37,0 request:: '/controller_epoch,T response:: > s{16,36,1368231712816,1368234889961,1,0,0,0,1,0,16} > {2013-05-10 19:14:52,219} INFO [Thread-5] (Logging.scala:67) - [Kafka Server > 0], shutting down -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira