Re: 0.7.3?

2013-06-03 Thread Chris Burroughs
Looking at
https://issues.apache.org/jira/browse/KAFKA#selectedTab=com.atlassian.jira.plugin.system.project%3Aversions-panel

I didn't see a 0.7.3 so I created one.  Jira also thinks 0.7.2 is still
un-released with KAFKA-411 open, but I'm not sure where that should belong.


On 05/23/2013 05:25 PM, Neha Narkhede wrote:
> Do you mind filing a JIRA for this ? Feel free to upload a patch.
> 
> Thanks,
> Neha
> 
> 
> On Thu, May 23, 2013 at 12:25 PM, Chris Burroughs > wrote:
> 
>> Did this ever get traction?
>> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=summary shows
>> activity in the 0.7 branch but the last tag as
>> kafka-0.7.2-incubating-candidate-5
>>
>>
>> http://mail-archives.apache.org/mod_mbox/kafka-dev/201302.mbox/%3cce36b916-a8b9-40c3-8a0e-958397c17...@gmail.com%3E
>>
> 


[jira] [Commented] (KAFKA-927) Integrate controlled shutdown into kafka shutdown hook

2013-06-03 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13673182#comment-13673182
 ] 

Jun Rao commented on KAFKA-927:
---

Thanks for patch v2. A few more comments:

20. KafkaController: If when shutdownBroker is called, the controller is no 
longer active, both state machines will throw an exception on state change 
calls. However, the issue is that we add the shutdown broker to 
controllerContext.shuttingDownBrokerIds and it's never reset. This may become a 
problem if this broker becomes a controller again. At the minimum, we need to 
reset controllerContext.shuttingDownBrokerIds in  onControllerFailover(). 
However, I am a bit confused why we never reset 
controllerContext.shuttingDownBrokerIds and the shutdown logic still works.

21. ControlledShutdownRequest.handleError(): We should probably set 
partitionsRemaining in ControlledShutdownResponse to empty instead of null, 
since the serialization of ControlledShutdownResponse doesn't handle 
partitionsRemaining being null.

22. testRollingBounce:
22.1 The test makes sure that the leader for topic1 is changed after broker 0 
is shutdown. However, the leader for topic1 could be on broker 1 initially. In 
this case, the leader won't be changed after broker 0 is shutdown.
22.2 The default controlledShutdownRetryBackoffMs is 5secs, which is probably 
too long for the unit test. 

23. KafkaServer: We need to handle the errorCode in ControlledShutdownResponse 
since the controller may have moved after we send the ControlledShutdown 
request.

>From the previous review:
3. I think a simple solution is to (1) not call 
replicaManager.replicaFetcherManager.closeAllFetchers() in KafkaServer during 
shutdown; (2) in KafkaController.shutdownBroker(), for each partition on the 
shutdown broker, we first send a stopReplicaRequest to it for that partition 
before going through the state machine logic. Since the state machine logic 
involves ZK reads/writes, it's very likely that the stopReplicaRequest will 
reach the broker before the subsequent LeaderAndIsr requests. So, in most 
cases, the leader should be able to shrink ISR quicker than the timeout, 
without churns in ISR.

> Integrate controlled shutdown into kafka shutdown hook
> --
>
> Key: KAFKA-927
> URL: https://issues.apache.org/jira/browse/KAFKA-927
> Project: Kafka
>  Issue Type: Bug
>Reporter: Sriram Subramanian
>Assignee: Sriram Subramanian
> Attachments: KAFKA-927.patch, KAFKA-927-v2.patch
>
>
> The controlled shutdown mechanism should be integrated into the software for 
> better operational benefits. Also few optimizations can be done to reduce 
> unnecessary rpc and zk calls. This patch has been tested on a prod like 
> environment by doing rolling bounces continuously for a day. The average time 
> of doing a rolling bounce with controlled shutdown for a cluster with 7 nodes 
> without this patch is 340 seconds. With this patch it reduces to 220 seconds. 
> Also it ensures correctness in scenarios where the controller shrinks the isr 
> and the new leader could place the broker to be shutdown back into the isr.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (KAFKA-927) Integrate controlled shutdown into kafka shutdown hook

2013-06-03 Thread Sriram Subramanian (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriram Subramanian updated KAFKA-927:
-

Attachment: KAFKA-927-v2-revised.patch

Realized my previous patch did not have my latest changes just the new files.

20. shuttingDownBrokerIds does get updated on broker failure
21 done
22.1 i had already fixed this. The new patch should have the change
23. This is also handled in the new patch

3. That sounds reasonable among all the hacky fixes. 

> Integrate controlled shutdown into kafka shutdown hook
> --
>
> Key: KAFKA-927
> URL: https://issues.apache.org/jira/browse/KAFKA-927
> Project: Kafka
>  Issue Type: Bug
>Reporter: Sriram Subramanian
>Assignee: Sriram Subramanian
> Attachments: KAFKA-927.patch, KAFKA-927-v2.patch, 
> KAFKA-927-v2-revised.patch
>
>
> The controlled shutdown mechanism should be integrated into the software for 
> better operational benefits. Also few optimizations can be done to reduce 
> unnecessary rpc and zk calls. This patch has been tested on a prod like 
> environment by doing rolling bounces continuously for a day. The average time 
> of doing a rolling bounce with controlled shutdown for a cluster with 7 nodes 
> without this patch is 340 seconds. With this patch it reduces to 220 seconds. 
> Also it ensures correctness in scenarios where the controller shrinks the isr 
> and the new leader could place the broker to be shutdown back into the isr.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Closed] (KAFKA-897) NullPointerException in ConsoleConsumer

2013-06-03 Thread Colin B. (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin B. closed KAFKA-897.
--


> NullPointerException in ConsoleConsumer
> ---
>
> Key: KAFKA-897
> URL: https://issues.apache.org/jira/browse/KAFKA-897
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.8
>Reporter: Colin B.
>Assignee: Neha Narkhede
>Priority: Minor
> Fix For: 0.8.1
>
> Attachments: Kafka897-v1.patch, KAFKA-897-v2.patch
>
>
> The protocol document [1] mentions that keys and values in message sets can 
> be null. However the ConsoleConsumer throws a NPE when a null is passed for 
> the value.
> java.lang.NullPointerException
> at kafka.utils.Utils$.readBytes(Utils.scala:141)
> at 
> kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:106)
> at kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:33)
> at 
> kafka.utils.IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:61)
> at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:53)
> at scala.collection.Iterator$class.foreach(Iterator.scala:631)
> at kafka.utils.IteratorTemplate.foreach(IteratorTemplate.scala:32)
> at scala.collection.IterableLike$class.foreach(IterableLike.scala:79)
> at kafka.consumer.KafkaStream.foreach(KafkaStream.scala:25)
> at kafka.consumer.ConsoleConsumer$.main(ConsoleConsumer.scala:195)
> at kafka.consumer.ConsoleConsumer.main(ConsoleConsumer.scala)
> [1] 
> https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-Messagesets

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (KAFKA-927) Integrate controlled shutdown into kafka shutdown hook

2013-06-03 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13673380#comment-13673380
 ] 

Neha Narkhede commented on KAFKA-927:
-

Thanks for the revised v2 patch. Few more comments -

1. KafkaServer
1.1 startupComplete should either be a volatile variable to AtomicBoolean. Two 
different threads call startup() and controlledShutdown(), which modify 
startupComplete.
1.2 In controlledShutdown(), we need to handle error codes in 
ControlledShutdownResponse explicitly. It can happen that the error code is set 
and partitionsRemaining are 0, which will lead to errors.

2. Partition

>From previous review #4, if the broker has to ignore the become follower 
>request anyway, does it make sense to even process part of it and truncate log 
>etc ?

3. From previous review #3, I meant that it is pointless to do the ZK write on 
the controller since right after the write, since the follower hasn't received 
the stop replica request and the leader hasn't received shrunk isr, the broker 
being shut down will get added back to ISR. You can verify that this happens 
from the logs. It also makes controlled shutdown very slow since typically in 
production we move ~1000 partitions from the broker and zk writes can take 
~20ms which means several seconds wasted just doing the ZK writes. Instead, it 
is enough to let the leader shrink the isr by sending it the leader and isr 
request. On the other hand, we can argue that the OfflineReplica state change 
itself should be changed to avoid the ZK write. But that is a bigger change, so 
we should avoid that right now.

> Integrate controlled shutdown into kafka shutdown hook
> --
>
> Key: KAFKA-927
> URL: https://issues.apache.org/jira/browse/KAFKA-927
> Project: Kafka
>  Issue Type: Bug
>Reporter: Sriram Subramanian
>Assignee: Sriram Subramanian
> Attachments: KAFKA-927.patch, KAFKA-927-v2.patch, 
> KAFKA-927-v2-revised.patch
>
>
> The controlled shutdown mechanism should be integrated into the software for 
> better operational benefits. Also few optimizations can be done to reduce 
> unnecessary rpc and zk calls. This patch has been tested on a prod like 
> environment by doing rolling bounces continuously for a day. The average time 
> of doing a rolling bounce with controlled shutdown for a cluster with 7 nodes 
> without this patch is 340 seconds. With this patch it reduces to 220 seconds. 
> Also it ensures correctness in scenarios where the controller shrinks the isr 
> and the new leader could place the broker to be shutdown back into the isr.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (KAFKA-929) Download link in 0.7 quickstart broken

2013-06-03 Thread David Arthur (JIRA)
David Arthur created KAFKA-929:
--

 Summary: Download link in 0.7 quickstart broken
 Key: KAFKA-929
 URL: https://issues.apache.org/jira/browse/KAFKA-929
 Project: Kafka
  Issue Type: Bug
  Components: website
Reporter: David Arthur


http://kafka.apache.org/07/quickstart.html

links to http://kafka.apache.org/07/downloads.html, instead of 
http://kafka.apache.org/downloads.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Subscription: outstanding kafka patches

2013-06-03 Thread jira
Issue Subscription
Filter: outstanding kafka patches (76 issues)
The list of outstanding kafka patches
Subscriber: kafka-mailing-list

Key Summary
KAFKA-928   new topics may not be processed after ZK session expiration in 
controller
https://issues.apache.org/jira/browse/KAFKA-928
KAFKA-927   Integrate controlled shutdown into kafka shutdown hook
https://issues.apache.org/jira/browse/KAFKA-927
KAFKA-925   Add optional partition key override in producer
https://issues.apache.org/jira/browse/KAFKA-925
KAFKA-923   Improve controller failover latency
https://issues.apache.org/jira/browse/KAFKA-923
KAFKA-922   System Test - set retry.backoff.ms=300 to testcase_0119
https://issues.apache.org/jira/browse/KAFKA-922
KAFKA-917   Expose zk.session.timeout.ms in console consumer
https://issues.apache.org/jira/browse/KAFKA-917
KAFKA-915   System Test - Mirror Maker testcase_5001 failed
https://issues.apache.org/jira/browse/KAFKA-915
KAFKA-911   Bug in controlled shutdown logic in controller leads to controller 
not sending out some state change request 
https://issues.apache.org/jira/browse/KAFKA-911
KAFKA-905   Logs can have same offsets causing recovery failure
https://issues.apache.org/jira/browse/KAFKA-905
KAFKA-903   [0.8.0 - windows]  FATAL - [highwatermark-checkpoint-thread1] 
(Logging.scala:109) - Attempt to swap the new high watermark file with the old 
one failed
https://issues.apache.org/jira/browse/KAFKA-903
KAFKA-898   Add a KafkaMetricsReporter that wraps Librato's reporter
https://issues.apache.org/jira/browse/KAFKA-898
KAFKA-896   merge 0.8 (988d4d8e65a14390abd748318a64e281e4a37c19) to trunk
https://issues.apache.org/jira/browse/KAFKA-896
KAFKA-885   sbt package builds two kafka jars
https://issues.apache.org/jira/browse/KAFKA-885
KAFKA-881   Kafka broker not respecting log.roll.hours
https://issues.apache.org/jira/browse/KAFKA-881
KAFKA-877   Still getting kafka.common.NotLeaderForPartitionException
https://issues.apache.org/jira/browse/KAFKA-877
KAFKA-873   Consider replacing zkclient with curator (with zkclient-bridge)
https://issues.apache.org/jira/browse/KAFKA-873
KAFKA-868   System Test - add test case for rolling controlled shutdown
https://issues.apache.org/jira/browse/KAFKA-868
KAFKA-863   System Test - update 0.7 version of kafka-run-class.sh for 
Migration Tool test cases
https://issues.apache.org/jira/browse/KAFKA-863
KAFKA-859   support basic auth protection of mx4j console
https://issues.apache.org/jira/browse/KAFKA-859
KAFKA-855   Ant+Ivy build for Kafka
https://issues.apache.org/jira/browse/KAFKA-855
KAFKA-854   Upgrade dependencies for 0.8
https://issues.apache.org/jira/browse/KAFKA-854
KAFKA-852   Remove clientId from OffsetFetchResponse and OffsetCommitResponse
https://issues.apache.org/jira/browse/KAFKA-852
KAFKA-836   Update quickstart for Kafka 0.8
https://issues.apache.org/jira/browse/KAFKA-836
KAFKA-835   Update 0.8 configs on the website
https://issues.apache.org/jira/browse/KAFKA-835
KAFKA-815   Improve SimpleConsumerShell to take in a max messages config option
https://issues.apache.org/jira/browse/KAFKA-815
KAFKA-745   Remove getShutdownReceive() and other kafka specific code from the 
RequestChannel
https://issues.apache.org/jira/browse/KAFKA-745
KAFKA-739   Handle null values in Message payload
https://issues.apache.org/jira/browse/KAFKA-739
KAFKA-735   Add looping and JSON output for ConsumerOffsetChecker
https://issues.apache.org/jira/browse/KAFKA-735
KAFKA-717   scala 2.10 build support
https://issues.apache.org/jira/browse/KAFKA-717
KAFKA-705   Controlled shutdown doesn't seem to work on more than one broker in 
a cluster
https://issues.apache.org/jira/browse/KAFKA-705
KAFKA-686   0.8 Kafka broker should give a better error message when running 
against 0.7 zookeeper
https://issues.apache.org/jira/browse/KAFKA-686
KAFKA-682   java.lang.OutOfMemoryError: Java heap space
https://issues.apache.org/jira/browse/KAFKA-682
KAFKA-677   Retention process gives exception if an empty segment is chosen for 
collection
https://issues.apache.org/jira/browse/KAFKA-677
KAFKA-674   Clean Shutdown Testing - Log segments checksums mismatch
https://issues.apache.org/jira/browse/KAFKA-674
KAFKA-652   Create testcases for clean shut-down
https://issues.apache.org/jira/browse/KAFKA-652
KAFKA-649   Cleanup log4j logging
https://issues.apache.org/jira/browse/KAFKA-649
KAFKA-645   Create a shell script to run System Test with DEBUG details and 
"tee" console output to a file
https://issues.apache.org/jira/browse/KAFKA-645
KAFKA-637   Separate l

[jira] [Updated] (KAFKA-928) new topics may not be processed after ZK session expiration in controller

2013-06-03 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-928:


Attachment: kafka-928-v2.patch

I think you are right, we don't need both anymore. See the updated patch.

> new topics may not be processed after ZK session expiration in controller
> -
>
> Key: KAFKA-928
> URL: https://issues.apache.org/jira/browse/KAFKA-928
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.8
>Reporter: Jun Rao
>Assignee: Neha Narkhede
>Priority: Blocker
> Attachments: kafka-928.patch, kafka-928-v2.patch
>
>
> When controller loses its ZK session, it calls partitionStateMachine.shutdown 
> in SessionExpirationListener, which marks the partitionStateMachine as down. 
> However, when the controller regains its controllership, it doesn't mark 
> partitionStateMachine as up. In TopicChangeListener, we only process new 
> topics if the partitionStateMachine is marked up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (KAFKA-927) Integrate controlled shutdown into kafka shutdown hook

2013-06-03 Thread Sriram Subramanian (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriram Subramanian updated KAFKA-927:
-

Attachment: KAFKA-927-v3.patch

1.1 Done
1.2 Done
2. We would need to do some of these to ensure the new leader is updated and 
the log itself is going to be truncated either on startup or shutdown. Hence 
did not feel a strong reason to make this path more optimized.

3. As we spoke offline, there seems to be edge case where not updating ZK could 
lead to bad things happening. So updating ZK before leaderisr request.

> Integrate controlled shutdown into kafka shutdown hook
> --
>
> Key: KAFKA-927
> URL: https://issues.apache.org/jira/browse/KAFKA-927
> Project: Kafka
>  Issue Type: Bug
>Reporter: Sriram Subramanian
>Assignee: Sriram Subramanian
> Attachments: KAFKA-927.patch, KAFKA-927-v2.patch, 
> KAFKA-927-v2-revised.patch, KAFKA-927-v3.patch
>
>
> The controlled shutdown mechanism should be integrated into the software for 
> better operational benefits. Also few optimizations can be done to reduce 
> unnecessary rpc and zk calls. This patch has been tested on a prod like 
> environment by doing rolling bounces continuously for a day. The average time 
> of doing a rolling bounce with controlled shutdown for a cluster with 7 nodes 
> without this patch is 340 seconds. With this patch it reduces to 220 seconds. 
> Also it ensures correctness in scenarios where the controller shrinks the isr 
> and the new leader could place the broker to be shutdown back into the isr.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (KAFKA-928) new topics may not be processed after ZK session expiration in controller

2013-06-03 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13673533#comment-13673533
 ] 

Jun Rao commented on KAFKA-928:
---

Thanks for patch v2. +1.

> new topics may not be processed after ZK session expiration in controller
> -
>
> Key: KAFKA-928
> URL: https://issues.apache.org/jira/browse/KAFKA-928
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.8
>Reporter: Jun Rao
>Assignee: Neha Narkhede
>Priority: Blocker
> Attachments: kafka-928.patch, kafka-928-v2.patch
>
>
> When controller loses its ZK session, it calls partitionStateMachine.shutdown 
> in SessionExpirationListener, which marks the partitionStateMachine as down. 
> However, when the controller regains its controllership, it doesn't mark 
> partitionStateMachine as up. In TopicChangeListener, we only process new 
> topics if the partitionStateMachine is marked up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (KAFKA-928) new topics may not be processed after ZK session expiration in controller

2013-06-03 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-928:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks for the review, committed patch to 08

> new topics may not be processed after ZK session expiration in controller
> -
>
> Key: KAFKA-928
> URL: https://issues.apache.org/jira/browse/KAFKA-928
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.8
>Reporter: Jun Rao
>Assignee: Neha Narkhede
>Priority: Blocker
> Attachments: kafka-928.patch, kafka-928-v2.patch
>
>
> When controller loses its ZK session, it calls partitionStateMachine.shutdown 
> in SessionExpirationListener, which marks the partitionStateMachine as down. 
> However, when the controller regains its controllership, it doesn't mark 
> partitionStateMachine as up. In TopicChangeListener, we only process new 
> topics if the partitionStateMachine is marked up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Closed] (KAFKA-928) new topics may not be processed after ZK session expiration in controller

2013-06-03 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede closed KAFKA-928.
---


> new topics may not be processed after ZK session expiration in controller
> -
>
> Key: KAFKA-928
> URL: https://issues.apache.org/jira/browse/KAFKA-928
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.8
>Reporter: Jun Rao
>Assignee: Neha Narkhede
>Priority: Blocker
> Attachments: kafka-928.patch, kafka-928-v2.patch
>
>
> When controller loses its ZK session, it calls partitionStateMachine.shutdown 
> in SessionExpirationListener, which marks the partitionStateMachine as down. 
> However, when the controller regains its controllership, it doesn't mark 
> partitionStateMachine as up. In TopicChangeListener, we only process new 
> topics if the partitionStateMachine is marked up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (KAFKA-928) new topics may not be processed after ZK session expiration in controller

2013-06-03 Thread Swapnil Ghike (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13673571#comment-13673571
 ] 

Swapnil Ghike commented on KAFKA-928:
-

Was just about to comment, perhaps it would be good to rename hasStarted to 
isRunning like in KafkaController. +1 otherwise.

> new topics may not be processed after ZK session expiration in controller
> -
>
> Key: KAFKA-928
> URL: https://issues.apache.org/jira/browse/KAFKA-928
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.8
>Reporter: Jun Rao
>Assignee: Neha Narkhede
>Priority: Blocker
> Attachments: kafka-928.patch, kafka-928-v2.patch
>
>
> When controller loses its ZK session, it calls partitionStateMachine.shutdown 
> in SessionExpirationListener, which marks the partitionStateMachine as down. 
> However, when the controller regains its controllership, it doesn't mark 
> partitionStateMachine as up. In TopicChangeListener, we only process new 
> topics if the partitionStateMachine is marked up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (KAFKA-927) Integrate controlled shutdown into kafka shutdown hook

2013-06-03 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13673651#comment-13673651
 ] 

Jun Rao commented on KAFKA-927:
---

Thanks for patch v3. A few more comments:

30. KafkaServer:
30.1 Could you combine isShuttingDown and startupComplete?
30.2 In controlledShutdown(), it's not clear if it's worth caching the socket 
channel. Technically, it's possible for a controller to come back on the broker 
with the same id, but with a different broker host/port. It's simpler to just 
always close the socket channel on each ControlledShutdownRequest and create a 
new channel on retry.

31. KafkaController:
31.1 remove unused import java.util.concurrent.{Semaphore
31.2 I think we still need to set shuttingDownBrokerIds to empty in 
onControllerFailover(). A controller may failover during a controlled shutdown 
and later regain the controllership. OnBrokerFailure() is only called if the 
controller is active. So shuttingDownBrokerIds may not be empty when the 
controllership switches back.

> Integrate controlled shutdown into kafka shutdown hook
> --
>
> Key: KAFKA-927
> URL: https://issues.apache.org/jira/browse/KAFKA-927
> Project: Kafka
>  Issue Type: Bug
>Reporter: Sriram Subramanian
>Assignee: Sriram Subramanian
> Attachments: KAFKA-927.patch, KAFKA-927-v2.patch, 
> KAFKA-927-v2-revised.patch, KAFKA-927-v3.patch
>
>
> The controlled shutdown mechanism should be integrated into the software for 
> better operational benefits. Also few optimizations can be done to reduce 
> unnecessary rpc and zk calls. This patch has been tested on a prod like 
> environment by doing rolling bounces continuously for a day. The average time 
> of doing a rolling bounce with controlled shutdown for a cluster with 7 nodes 
> without this patch is 340 seconds. With this patch it reduces to 220 seconds. 
> Also it ensures correctness in scenarios where the controller shrinks the isr 
> and the new leader could place the broker to be shutdown back into the isr.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (KAFKA-927) Integrate controlled shutdown into kafka shutdown hook

2013-06-03 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13673659#comment-13673659
 ] 

Neha Narkhede commented on KAFKA-927:
-

+1 on v3 other than Jun's comments.

> Integrate controlled shutdown into kafka shutdown hook
> --
>
> Key: KAFKA-927
> URL: https://issues.apache.org/jira/browse/KAFKA-927
> Project: Kafka
>  Issue Type: Bug
>Reporter: Sriram Subramanian
>Assignee: Sriram Subramanian
> Attachments: KAFKA-927.patch, KAFKA-927-v2.patch, 
> KAFKA-927-v2-revised.patch, KAFKA-927-v3.patch
>
>
> The controlled shutdown mechanism should be integrated into the software for 
> better operational benefits. Also few optimizations can be done to reduce 
> unnecessary rpc and zk calls. This patch has been tested on a prod like 
> environment by doing rolling bounces continuously for a day. The average time 
> of doing a rolling bounce with controlled shutdown for a cluster with 7 nodes 
> without this patch is 340 seconds. With this patch it reduces to 220 seconds. 
> Also it ensures correctness in scenarios where the controller shrinks the isr 
> and the new leader could place the broker to be shutdown back into the isr.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (KAFKA-927) Integrate controlled shutdown into kafka shutdown hook

2013-06-03 Thread Sriram Subramanian (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13673666#comment-13673666
 ] 

Sriram Subramanian commented on KAFKA-927:
--

30.1 Don't feel strong about this. I think it makes things less readable with 
not much savings
30.2 The new broker includes the host and port and hence it works.

31.1 Done
31.2 This is already there in the previous patch. It is in 
InitializeControllerContext

> Integrate controlled shutdown into kafka shutdown hook
> --
>
> Key: KAFKA-927
> URL: https://issues.apache.org/jira/browse/KAFKA-927
> Project: Kafka
>  Issue Type: Bug
>Reporter: Sriram Subramanian
>Assignee: Sriram Subramanian
> Attachments: KAFKA-927.patch, KAFKA-927-v2.patch, 
> KAFKA-927-v2-revised.patch, KAFKA-927-v3.patch, 
> KAFKA-927-v3-removeimports.patch
>
>
> The controlled shutdown mechanism should be integrated into the software for 
> better operational benefits. Also few optimizations can be done to reduce 
> unnecessary rpc and zk calls. This patch has been tested on a prod like 
> environment by doing rolling bounces continuously for a day. The average time 
> of doing a rolling bounce with controlled shutdown for a cluster with 7 nodes 
> without this patch is 340 seconds. With this patch it reduces to 220 seconds. 
> Also it ensures correctness in scenarios where the controller shrinks the isr 
> and the new leader could place the broker to be shutdown back into the isr.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (KAFKA-927) Integrate controlled shutdown into kafka shutdown hook

2013-06-03 Thread Sriram Subramanian (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriram Subramanian updated KAFKA-927:
-

Attachment: KAFKA-927-v3-removeimports.patch

> Integrate controlled shutdown into kafka shutdown hook
> --
>
> Key: KAFKA-927
> URL: https://issues.apache.org/jira/browse/KAFKA-927
> Project: Kafka
>  Issue Type: Bug
>Reporter: Sriram Subramanian
>Assignee: Sriram Subramanian
> Attachments: KAFKA-927.patch, KAFKA-927-v2.patch, 
> KAFKA-927-v2-revised.patch, KAFKA-927-v3.patch, 
> KAFKA-927-v3-removeimports.patch
>
>
> The controlled shutdown mechanism should be integrated into the software for 
> better operational benefits. Also few optimizations can be done to reduce 
> unnecessary rpc and zk calls. This patch has been tested on a prod like 
> environment by doing rolling bounces continuously for a day. The average time 
> of doing a rolling bounce with controlled shutdown for a cluster with 7 nodes 
> without this patch is 340 seconds. With this patch it reduces to 220 seconds. 
> Also it ensures correctness in scenarios where the controller shrinks the isr 
> and the new leader could place the broker to be shutdown back into the isr.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (KAFKA-927) Integrate controlled shutdown into kafka shutdown hook

2013-06-03 Thread Sriram Subramanian (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriram Subramanian updated KAFKA-927:
-

Attachment: KAFKA-927-v4.patch

>From offline feedback
1. reset startupcomplete flag on shutdown for unit test
2. cleaned channel before shutting down

> Integrate controlled shutdown into kafka shutdown hook
> --
>
> Key: KAFKA-927
> URL: https://issues.apache.org/jira/browse/KAFKA-927
> Project: Kafka
>  Issue Type: Bug
>Reporter: Sriram Subramanian
>Assignee: Sriram Subramanian
> Attachments: KAFKA-927.patch, KAFKA-927-v2.patch, 
> KAFKA-927-v2-revised.patch, KAFKA-927-v3.patch, 
> KAFKA-927-v3-removeimports.patch, KAFKA-927-v4.patch
>
>
> The controlled shutdown mechanism should be integrated into the software for 
> better operational benefits. Also few optimizations can be done to reduce 
> unnecessary rpc and zk calls. This patch has been tested on a prod like 
> environment by doing rolling bounces continuously for a day. The average time 
> of doing a rolling bounce with controlled shutdown for a cluster with 7 nodes 
> without this patch is 340 seconds. With this patch it reduces to 220 seconds. 
> Also it ensures correctness in scenarios where the controller shrinks the isr 
> and the new leader could place the broker to be shutdown back into the isr.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (KAFKA-930) Integrate preferred replica election logic into kafka

2013-06-03 Thread Sriram Subramanian (JIRA)
Sriram Subramanian created KAFKA-930:


 Summary: Integrate preferred replica election logic into kafka
 Key: KAFKA-930
 URL: https://issues.apache.org/jira/browse/KAFKA-930
 Project: Kafka
  Issue Type: Bug
Reporter: Sriram Subramanian
Assignee: Sriram Subramanian
 Fix For: 0.9


It seems useful to integrate the preferred replica election logic into kafka 
controller. A simple way to implement this would be to have a background thread 
that periodically finds the topic partitions that are not assigned to the 
preferred broker and initiate the move. We could come up with some heuristics 
to initiate the move only if the imbalance over a specific threshold in order 
to avoid rebalancing too aggressively. Making the software do this reduces 
operational cost.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (KAFKA-927) Integrate controlled shutdown into kafka shutdown hook

2013-06-03 Thread Jun Rao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao resolved KAFKA-927.
---

   Resolution: Fixed
Fix Version/s: 0.8

Thanks for patch v4. +1 and committed to 0.8.

> Integrate controlled shutdown into kafka shutdown hook
> --
>
> Key: KAFKA-927
> URL: https://issues.apache.org/jira/browse/KAFKA-927
> Project: Kafka
>  Issue Type: Bug
>Reporter: Sriram Subramanian
>Assignee: Sriram Subramanian
> Fix For: 0.8
>
> Attachments: KAFKA-927.patch, KAFKA-927-v2.patch, 
> KAFKA-927-v2-revised.patch, KAFKA-927-v3.patch, 
> KAFKA-927-v3-removeimports.patch, KAFKA-927-v4.patch
>
>
> The controlled shutdown mechanism should be integrated into the software for 
> better operational benefits. Also few optimizations can be done to reduce 
> unnecessary rpc and zk calls. This patch has been tested on a prod like 
> environment by doing rolling bounces continuously for a day. The average time 
> of doing a rolling bounce with controlled shutdown for a cluster with 7 nodes 
> without this patch is 340 seconds. With this patch it reduces to 220 seconds. 
> Also it ensures correctness in scenarios where the controller shrinks the isr 
> and the new leader could place the broker to be shutdown back into the isr.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (KAFKA-903) [0.8.0 - windows] FATAL - [highwatermark-checkpoint-thread1] (Logging.scala:109) - Attempt to swap the new high watermark file with the old one failed

2013-06-03 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13673781#comment-13673781
 ] 

Jay Kreps commented on KAFKA-903:
-

+1

> [0.8.0 - windows]  FATAL - [highwatermark-checkpoint-thread1] 
> (Logging.scala:109) - Attempt to swap the new high watermark file with the 
> old one failed
> ---
>
> Key: KAFKA-903
> URL: https://issues.apache.org/jira/browse/KAFKA-903
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8
> Environment: Windows 7 with SP 1; jdk 7_0_17; scala-library-2.8.2, 
> probably copied on 4/30. kafka-0.8, built current on 4/30.
> -rwx--+ 1 reefedjib None   41123 Mar 19  2009 commons-cli-1.2.jar
> -rwx--+ 1 reefedjib None   58160 Jan 11 13:45 commons-codec-1.4.jar
> -rwx--+ 1 reefedjib None  575389 Apr 18 13:41 
> commons-collections-3.2.1.jar
> -rwx--+ 1 reefedjib None  143847 May 21  2009 commons-compress-1.0.jar
> -rwx--+ 1 reefedjib None   52543 Jan 11 13:45 commons-exec-1.1.jar
> -rwx--+ 1 reefedjib None   57779 Jan 11 13:45 commons-fileupload-1.2.1.jar
> -rwx--+ 1 reefedjib None  109043 Jan 20  2008 commons-io-1.4.jar
> -rwx--+ 1 reefedjib None  279193 Jan 11 13:45 commons-lang-2.5.jar
> -rwx--+ 1 reefedjib None   60686 Jan 11 13:45 commons-logging-1.1.1.jar
> -rwx--+ 1 reefedjib None 1891110 Apr 18 13:41 guava-13.0.1.jar
> -rwx--+ 1 reefedjib None  206866 Apr  7 21:24 jackson-core-2.1.4.jar
> -rwx--+ 1 reefedjib None  232245 Apr  7 21:24 jackson-core-asl-1.9.12.jar
> -rwx--+ 1 reefedjib None   69314 Apr  7 21:24 
> jackson-dataformat-smile-2.1.4.jar
> -rwx--+ 1 reefedjib None  780385 Apr  7 21:24 
> jackson-mapper-asl-1.9.12.jar
> -rwx--+ 1 reefedjib None   47913 May  9 23:39 jopt-simple-3.0-rc2.jar
> -rwx--+ 1 reefedjib None 2365575 Apr 30 13:06 
> kafka_2.8.0-0.8.0-SNAPSHOT.jar
> -rwx--+ 1 reefedjib None  481535 Jan 11 13:46 log4j-1.2.16.jar
> -rwx--+ 1 reefedjib None   20647 Apr 18 13:41 log4j-over-slf4j-1.6.6.jar
> -rwx--+ 1 reefedjib None  251784 Apr 18 13:41 logback-classic-1.0.6.jar
> -rwx--+ 1 reefedjib None  349706 Apr 18 13:41 logback-core-1.0.6.jar
> -rwx--+ 1 reefedjib None   82123 Nov 26 13:11 metrics-core-2.2.0.jar
> -rwx--+ 1 reefedjib None 1540457 Jul 12  2012 ojdbc14.jar
> -rwx--+ 1 reefedjib None 6418368 Apr 30 08:23 scala-library-2.8.2.jar
> -rwx--+ 1 reefedjib None 3114958 Apr  2 07:47 scalatest_2.10-1.9.1.jar
> -rwx--+ 1 reefedjib None   25962 Apr 18 13:41 slf4j-api-1.6.5.jar
> -rwx--+ 1 reefedjib None   62269 Nov 29 03:26 zkclient-0.2.jar
> -rwx--+ 1 reefedjib None  601677 Apr 18 13:41 zookeeper-3.3.3.jar
>Reporter: Rob Withers
>Priority: Blocker
> Attachments: kafka_2.8.0-0.8.0-SNAPSHOT.jar, kafka-903.patch, 
> kafka-903_v2.patch, kafka-903_v3.patch
>
>
> This FATAL shuts down both brokers on windows, 
> {2013-05-10 18:23:57,636} DEBUG [local-vat] (Logging.scala:51) - Sending 1 
> messages with no compression to [robert_v_2x0,0]
> {2013-05-10 18:23:57,637} DEBUG [local-vat] (Logging.scala:51) - Producer 
> sending messages with correlation id 178 for topics [robert_v_2x0,0] to 
> broker 1 on 192.168.1.100:9093
> {2013-05-10 18:23:57,689} FATAL [highwatermark-checkpoint-thread1] 
> (Logging.scala:109) - Attempt to swap the new high watermark file with the 
> old one failed
> {2013-05-10 18:23:57,739}  INFO [Thread-4] (Logging.scala:67) - [Kafka 
> Server 0], shutting down
> Furthermore, attempts to restart them fail, with the following log:
> {2013-05-10 19:14:52,156}  INFO [Thread-1] (Logging.scala:67) - [Kafka Server 
> 0], started
> {2013-05-10 19:14:52,157}  INFO [ZkClient-EventThread-32-localhost:2181] 
> (Logging.scala:67) - New leader is 0
> {2013-05-10 19:14:52,193} DEBUG [ZkClient-EventThread-32-localhost:2181] 
> (ZkEventThread.java:79) - Delivering event #1 done
> {2013-05-10 19:14:52,193} DEBUG [ZkClient-EventThread-32-localhost:2181] 
> (ZkEventThread.java:69) - Delivering event #4 ZkEvent[Data of 
> /controller_epoch changed sent to 
> kafka.controller.ControllerEpochListener@5cb88f42]
> {2013-05-10 19:14:52,210} DEBUG [SyncThread:0] 
> (FinalRequestProcessor.java:78) - Processing request:: 
> sessionid:0x13e9127882e0001 type:exists cxid:0x1d zxid:0xfffe 
> txntype:unknown reqpath:/controller_epoch
> {2013-05-10 19:14:52,210} DEBUG [SyncThread:0] 
> (FinalRequestProcessor.java:160) - sessionid:0x13e9127882e0001 type:exists 
> cxid:0x1d zxid:0xfffe txntype:unknown reqpath:/controller_epoch
> {2013-05-10 19:14:52,213} DEBUG [Thread-1-SendThread(localhost:2181)] 
> (ClientCnxn.java:838) - Reading reply sessionid:0x1

[jira] [Updated] (KAFKA-905) Logs can have same offsets causing recovery failure

2013-06-03 Thread Sriram Subramanian (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriram Subramanian updated KAFKA-905:
-

Attachment: KAFKA-905-v2.patch

- made the logging changes
- added the missing file

> Logs can have same offsets causing recovery failure
> ---
>
> Key: KAFKA-905
> URL: https://issues.apache.org/jira/browse/KAFKA-905
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8
>Reporter: Sriram Subramanian
>Assignee: Sriram Subramanian
> Fix For: 0.8
>
> Attachments: KAFKA-905.patch, KAFKA-905.rtf, KAFKA-905-v2.patch
>
>
> Consider the following scenario - 
> L   F
> 1  m1,m21 m1,m2
> 3  m3,m43 m3,m4
> 5  m5,m65 m5,m6
> HW = 6   HW = 4
> Follower goes down and comes back up. Truncates its log to HW
> L F
> 1  m1,m2   1 m1,m2
> 3  m3,m4   3 m3,m4
> 5  m5,m6
> HW = 6HW = 4
> Before follower catches up with the leader, leader goes down and follower 
> becomes the leader. It then gets new messages
> F   L
> 1  m1,m21  m1,m2
> 3  m3,m43  m3,m4
> 5  m5,m6  10 m5-m10
> HW=6  HW=4
> follower fetches from offset 7. Since offset 7 is within the compressed 
> message 10 in the leader, the whole message chunk is sent to the follower
> FL  
> 1   m1,m2 1  m1,m2
> 3   m3,m4 3  m3,m4  
> 5   m5,m6   10  m5-m10
> 10 m5-m10
> HW=4   HW=10
> The follower logs now contain the same offsets. On recovery, re-indexing will 
> fail due to repeated offsets.
> Possible ways to fix this - 
> 1. The fetcher thread can do deep iteration instead of shallow iteration and 
> drop the offsets that are less than the log end offset. This would however 
> incur performance hit.
> 2. To optimize step 1, we could do the deep iteration till the logical offset 
> of the fetched message set is greater than the log end offset of the follower 
> log and then switch to shallow iteration.
> 3. On recovery we just truncate the active segment and refetch the data.
> All the above 3 steps are hacky. The right fix is to ensure we never corrupt 
> the logs. We can incur data loss but should not compromise consistency. For 
> 0.8, the easiest and simplest fix would be 3. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (KAFKA-903) [0.8.0 - windows] FATAL - [highwatermark-checkpoint-thread1] (Logging.scala:109) - Attempt to swap the new high watermark file with the old one failed

2013-06-03 Thread Jun Rao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao updated KAFKA-903:
--

   Resolution: Fixed
Fix Version/s: 0.8
 Assignee: Jun Rao
   Status: Resolved  (was: Patch Available)

Thanks for the review. Committed v3 to 0.8.

> [0.8.0 - windows]  FATAL - [highwatermark-checkpoint-thread1] 
> (Logging.scala:109) - Attempt to swap the new high watermark file with the 
> old one failed
> ---
>
> Key: KAFKA-903
> URL: https://issues.apache.org/jira/browse/KAFKA-903
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8
> Environment: Windows 7 with SP 1; jdk 7_0_17; scala-library-2.8.2, 
> probably copied on 4/30. kafka-0.8, built current on 4/30.
> -rwx--+ 1 reefedjib None   41123 Mar 19  2009 commons-cli-1.2.jar
> -rwx--+ 1 reefedjib None   58160 Jan 11 13:45 commons-codec-1.4.jar
> -rwx--+ 1 reefedjib None  575389 Apr 18 13:41 
> commons-collections-3.2.1.jar
> -rwx--+ 1 reefedjib None  143847 May 21  2009 commons-compress-1.0.jar
> -rwx--+ 1 reefedjib None   52543 Jan 11 13:45 commons-exec-1.1.jar
> -rwx--+ 1 reefedjib None   57779 Jan 11 13:45 commons-fileupload-1.2.1.jar
> -rwx--+ 1 reefedjib None  109043 Jan 20  2008 commons-io-1.4.jar
> -rwx--+ 1 reefedjib None  279193 Jan 11 13:45 commons-lang-2.5.jar
> -rwx--+ 1 reefedjib None   60686 Jan 11 13:45 commons-logging-1.1.1.jar
> -rwx--+ 1 reefedjib None 1891110 Apr 18 13:41 guava-13.0.1.jar
> -rwx--+ 1 reefedjib None  206866 Apr  7 21:24 jackson-core-2.1.4.jar
> -rwx--+ 1 reefedjib None  232245 Apr  7 21:24 jackson-core-asl-1.9.12.jar
> -rwx--+ 1 reefedjib None   69314 Apr  7 21:24 
> jackson-dataformat-smile-2.1.4.jar
> -rwx--+ 1 reefedjib None  780385 Apr  7 21:24 
> jackson-mapper-asl-1.9.12.jar
> -rwx--+ 1 reefedjib None   47913 May  9 23:39 jopt-simple-3.0-rc2.jar
> -rwx--+ 1 reefedjib None 2365575 Apr 30 13:06 
> kafka_2.8.0-0.8.0-SNAPSHOT.jar
> -rwx--+ 1 reefedjib None  481535 Jan 11 13:46 log4j-1.2.16.jar
> -rwx--+ 1 reefedjib None   20647 Apr 18 13:41 log4j-over-slf4j-1.6.6.jar
> -rwx--+ 1 reefedjib None  251784 Apr 18 13:41 logback-classic-1.0.6.jar
> -rwx--+ 1 reefedjib None  349706 Apr 18 13:41 logback-core-1.0.6.jar
> -rwx--+ 1 reefedjib None   82123 Nov 26 13:11 metrics-core-2.2.0.jar
> -rwx--+ 1 reefedjib None 1540457 Jul 12  2012 ojdbc14.jar
> -rwx--+ 1 reefedjib None 6418368 Apr 30 08:23 scala-library-2.8.2.jar
> -rwx--+ 1 reefedjib None 3114958 Apr  2 07:47 scalatest_2.10-1.9.1.jar
> -rwx--+ 1 reefedjib None   25962 Apr 18 13:41 slf4j-api-1.6.5.jar
> -rwx--+ 1 reefedjib None   62269 Nov 29 03:26 zkclient-0.2.jar
> -rwx--+ 1 reefedjib None  601677 Apr 18 13:41 zookeeper-3.3.3.jar
>Reporter: Rob Withers
>Assignee: Jun Rao
>Priority: Blocker
> Fix For: 0.8
>
> Attachments: kafka_2.8.0-0.8.0-SNAPSHOT.jar, kafka-903.patch, 
> kafka-903_v2.patch, kafka-903_v3.patch
>
>
> This FATAL shuts down both brokers on windows, 
> {2013-05-10 18:23:57,636} DEBUG [local-vat] (Logging.scala:51) - Sending 1 
> messages with no compression to [robert_v_2x0,0]
> {2013-05-10 18:23:57,637} DEBUG [local-vat] (Logging.scala:51) - Producer 
> sending messages with correlation id 178 for topics [robert_v_2x0,0] to 
> broker 1 on 192.168.1.100:9093
> {2013-05-10 18:23:57,689} FATAL [highwatermark-checkpoint-thread1] 
> (Logging.scala:109) - Attempt to swap the new high watermark file with the 
> old one failed
> {2013-05-10 18:23:57,739}  INFO [Thread-4] (Logging.scala:67) - [Kafka 
> Server 0], shutting down
> Furthermore, attempts to restart them fail, with the following log:
> {2013-05-10 19:14:52,156}  INFO [Thread-1] (Logging.scala:67) - [Kafka Server 
> 0], started
> {2013-05-10 19:14:52,157}  INFO [ZkClient-EventThread-32-localhost:2181] 
> (Logging.scala:67) - New leader is 0
> {2013-05-10 19:14:52,193} DEBUG [ZkClient-EventThread-32-localhost:2181] 
> (ZkEventThread.java:79) - Delivering event #1 done
> {2013-05-10 19:14:52,193} DEBUG [ZkClient-EventThread-32-localhost:2181] 
> (ZkEventThread.java:69) - Delivering event #4 ZkEvent[Data of 
> /controller_epoch changed sent to 
> kafka.controller.ControllerEpochListener@5cb88f42]
> {2013-05-10 19:14:52,210} DEBUG [SyncThread:0] 
> (FinalRequestProcessor.java:78) - Processing request:: 
> sessionid:0x13e9127882e0001 type:exists cxid:0x1d zxid:0xfffe 
> txntype:unknown reqpath:/controller_epoch
> {2013-05-10 19:14:52,210} DEBUG [SyncThread:0] 
> (FinalRequestProcessor.java:160) - sessionid:0x13e9127882e0001 type:exists 
> cxid:0x1d zxid:0xfffe 

[jira] [Resolved] (KAFKA-905) Logs can have same offsets causing recovery failure

2013-06-03 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede resolved KAFKA-905.
-

Resolution: Fixed

Thanks for v2, committed it to 0.8

> Logs can have same offsets causing recovery failure
> ---
>
> Key: KAFKA-905
> URL: https://issues.apache.org/jira/browse/KAFKA-905
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8
>Reporter: Sriram Subramanian
>Assignee: Sriram Subramanian
> Fix For: 0.8
>
> Attachments: KAFKA-905.patch, KAFKA-905.rtf, KAFKA-905-v2.patch
>
>
> Consider the following scenario - 
> L   F
> 1  m1,m21 m1,m2
> 3  m3,m43 m3,m4
> 5  m5,m65 m5,m6
> HW = 6   HW = 4
> Follower goes down and comes back up. Truncates its log to HW
> L F
> 1  m1,m2   1 m1,m2
> 3  m3,m4   3 m3,m4
> 5  m5,m6
> HW = 6HW = 4
> Before follower catches up with the leader, leader goes down and follower 
> becomes the leader. It then gets new messages
> F   L
> 1  m1,m21  m1,m2
> 3  m3,m43  m3,m4
> 5  m5,m6  10 m5-m10
> HW=6  HW=4
> follower fetches from offset 7. Since offset 7 is within the compressed 
> message 10 in the leader, the whole message chunk is sent to the follower
> FL  
> 1   m1,m2 1  m1,m2
> 3   m3,m4 3  m3,m4  
> 5   m5,m6   10  m5-m10
> 10 m5-m10
> HW=4   HW=10
> The follower logs now contain the same offsets. On recovery, re-indexing will 
> fail due to repeated offsets.
> Possible ways to fix this - 
> 1. The fetcher thread can do deep iteration instead of shallow iteration and 
> drop the offsets that are less than the log end offset. This would however 
> incur performance hit.
> 2. To optimize step 1, we could do the deep iteration till the logical offset 
> of the fetched message set is greater than the log end offset of the follower 
> log and then switch to shallow iteration.
> 3. On recovery we just truncate the active segment and refetch the data.
> All the above 3 steps are hacky. The right fix is to ensure we never corrupt 
> the logs. We can incur data loss but should not compromise consistency. For 
> 0.8, the easiest and simplest fix would be 3. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (KAFKA-931) make zookeeper.connect a required property

2013-06-03 Thread Jun Rao (JIRA)
Jun Rao created KAFKA-931:
-

 Summary: make zookeeper.connect a required property
 Key: KAFKA-931
 URL: https://issues.apache.org/jira/browse/KAFKA-931
 Project: Kafka
  Issue Type: Improvement
Affects Versions: 0.8
Reporter: Jun Rao
Assignee: Jun Rao


Currently, zookeeper.connect defaults to a null string. If this property is not 
overwritten, we will see weird NullPointerException in ZK. It will be better to 
make zookeeper.connect required property.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (KAFKA-905) Logs can have same offsets causing recovery failure

2013-06-03 Thread Sriram Subramanian (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriram Subramanian updated KAFKA-905:
-

Attachment: KAFKA-905-trunk.patch

changes for trunk

> Logs can have same offsets causing recovery failure
> ---
>
> Key: KAFKA-905
> URL: https://issues.apache.org/jira/browse/KAFKA-905
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8
>Reporter: Sriram Subramanian
>Assignee: Sriram Subramanian
> Fix For: 0.8
>
> Attachments: KAFKA-905.patch, KAFKA-905.rtf, KAFKA-905-trunk.patch, 
> KAFKA-905-v2.patch
>
>
> Consider the following scenario - 
> L   F
> 1  m1,m21 m1,m2
> 3  m3,m43 m3,m4
> 5  m5,m65 m5,m6
> HW = 6   HW = 4
> Follower goes down and comes back up. Truncates its log to HW
> L F
> 1  m1,m2   1 m1,m2
> 3  m3,m4   3 m3,m4
> 5  m5,m6
> HW = 6HW = 4
> Before follower catches up with the leader, leader goes down and follower 
> becomes the leader. It then gets new messages
> F   L
> 1  m1,m21  m1,m2
> 3  m3,m43  m3,m4
> 5  m5,m6  10 m5-m10
> HW=6  HW=4
> follower fetches from offset 7. Since offset 7 is within the compressed 
> message 10 in the leader, the whole message chunk is sent to the follower
> FL  
> 1   m1,m2 1  m1,m2
> 3   m3,m4 3  m3,m4  
> 5   m5,m6   10  m5-m10
> 10 m5-m10
> HW=4   HW=10
> The follower logs now contain the same offsets. On recovery, re-indexing will 
> fail due to repeated offsets.
> Possible ways to fix this - 
> 1. The fetcher thread can do deep iteration instead of shallow iteration and 
> drop the offsets that are less than the log end offset. This would however 
> incur performance hit.
> 2. To optimize step 1, we could do the deep iteration till the logical offset 
> of the fetched message set is greater than the log end offset of the follower 
> log and then switch to shallow iteration.
> 3. On recovery we just truncate the active segment and refetch the data.
> All the above 3 steps are hacky. The right fix is to ensure we never corrupt 
> the logs. We can incur data loss but should not compromise consistency. For 
> 0.8, the easiest and simplest fix would be 3. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (KAFKA-931) make zookeeper.connect a required property

2013-06-03 Thread Jun Rao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao updated KAFKA-931:
--

Attachment: kafka-931.patch

Attach a patch that does the following:
1. make zookeeper.connect a required property.
2. change the default value of auto.offset.reset to largest (to be consistent 
with ConsoleConsumer).
3. change the property name queued.max.messages to make it more intuitive.

> make zookeeper.connect a required property
> --
>
> Key: KAFKA-931
> URL: https://issues.apache.org/jira/browse/KAFKA-931
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.8
>Reporter: Jun Rao
>Assignee: Jun Rao
> Attachments: kafka-931.patch
>
>
> Currently, zookeeper.connect defaults to a null string. If this property is 
> not overwritten, we will see weird NullPointerException in ZK. It will be 
> better to make zookeeper.connect required property.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (KAFKA-931) make zookeeper.connect a required property

2013-06-03 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13673858#comment-13673858
 ] 

Neha Narkhede commented on KAFKA-931:
-

+1. Minor suggestion - Can we change the default of queued.max.message.chunks 
to 2. Because, as long as there is one more message chunk ready to be 
processed, it is good enough. Having more chunks in the queue is somewhat 
pointless.

> make zookeeper.connect a required property
> --
>
> Key: KAFKA-931
> URL: https://issues.apache.org/jira/browse/KAFKA-931
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.8
>Reporter: Jun Rao
>Assignee: Jun Rao
> Attachments: kafka-931.patch
>
>
> Currently, zookeeper.connect defaults to a null string. If this property is 
> not overwritten, we will see weird NullPointerException in ZK. It will be 
> better to make zookeeper.connect required property.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (KAFKA-931) make zookeeper.connect a required property

2013-06-03 Thread Jun Rao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao updated KAFKA-931:
--

   Resolution: Fixed
Fix Version/s: 0.8
   Status: Resolved  (was: Patch Available)

Thanks for the review. Committed to 0.8 with the suggested change.

> make zookeeper.connect a required property
> --
>
> Key: KAFKA-931
> URL: https://issues.apache.org/jira/browse/KAFKA-931
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.8
>Reporter: Jun Rao
>Assignee: Jun Rao
> Fix For: 0.8
>
> Attachments: kafka-931.patch
>
>
> Currently, zookeeper.connect defaults to a null string. If this property is 
> not overwritten, we will see weird NullPointerException in ZK. It will be 
> better to make zookeeper.connect required property.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Work started] (KAFKA-931) make zookeeper.connect a required property

2013-06-03 Thread Jun Rao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on KAFKA-931 started by Jun Rao.

> make zookeeper.connect a required property
> --
>
> Key: KAFKA-931
> URL: https://issues.apache.org/jira/browse/KAFKA-931
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.8
>Reporter: Jun Rao
>Assignee: Jun Rao
> Attachments: kafka-931.patch
>
>
> Currently, zookeeper.connect defaults to a null string. If this property is 
> not overwritten, we will see weird NullPointerException in ZK. It will be 
> better to make zookeeper.connect required property.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (KAFKA-931) make zookeeper.connect a required property

2013-06-03 Thread Jun Rao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao updated KAFKA-931:
--

Status: Patch Available  (was: In Progress)

> make zookeeper.connect a required property
> --
>
> Key: KAFKA-931
> URL: https://issues.apache.org/jira/browse/KAFKA-931
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.8
>Reporter: Jun Rao
>Assignee: Jun Rao
> Attachments: kafka-931.patch
>
>
> Currently, zookeeper.connect defaults to a null string. If this property is 
> not overwritten, we will see weird NullPointerException in ZK. It will be 
> better to make zookeeper.connect required property.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (KAFKA-917) Expose zk.session.timeout.ms in console consumer

2013-06-03 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13673999#comment-13673999
 ] 

Jun Rao commented on KAFKA-917:
---

The patch no longer applies to 0.8. Could you rebase?

> Expose zk.session.timeout.ms in console consumer
> 
>
> Key: KAFKA-917
> URL: https://issues.apache.org/jira/browse/KAFKA-917
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.7, 0.8
>Reporter: Swapnil Ghike
>Assignee: Swapnil Ghike
>Priority: Blocker
>  Labels: bugs
> Fix For: 0.8
>
> Attachments: kafka-917.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (KAFKA-929) Download link in 0.7 quickstart broken

2013-06-03 Thread Jun Rao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao resolved KAFKA-929.
---

Resolution: Fixed

Thanks for identifying this. Fixed the website.

> Download link in 0.7 quickstart broken
> --
>
> Key: KAFKA-929
> URL: https://issues.apache.org/jira/browse/KAFKA-929
> Project: Kafka
>  Issue Type: Bug
>  Components: website
>Reporter: David Arthur
>
> http://kafka.apache.org/07/quickstart.html
> links to http://kafka.apache.org/07/downloads.html, instead of 
> http://kafka.apache.org/downloads.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira