Re: preparing for the 0.8 final release

2013-09-19 Thread Haithem Jarraya
Hi Jun,

any updates on this, there is a final release is planned soon? 

Thanks,

Haithem

On 13 Sep 2013, at 17:18, Jun Rao  wrote:

> Hi, Everyone,
> 
> We have been stabilizing the 0.8 branch since the beta1 release. I think we
> are getting close to an 0.8 final release. I made an initial list of the
> remaining jiras that should be fixed in 0.8.
> 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20KAFKA%20AND%20fixVersion%20%3D%20%220.8%22%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened%2C%20%22Patch%20Available%22)
> 
> 1. Do people agree with the list?
> 
> 2. If the list is good, could people help contributing/reviewing the
> remaining jiras?
> 
> Thanks,
> 
> Jun



[jira] [Commented] (KAFKA-1043) Time-consuming FetchRequest could block other request in the response queue

2013-09-19 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771981#comment-13771981
 ] 

Guozhang Wang commented on KAFKA-1043:
--

IMHO the local time processing the fetch response is linear to # partitions in 
the request, while the network time writing the socket buffer is not, depending 
on whether the data is still in file cache or not. Hence following the 1) 
reset-socket-buffer-size or 2) subset-topic-partitions-at-a-time methods if we 
need either 1) set the buffer size too small which is unfair for other requests 
that do not hit I/O and may result in unnecessary round trips or 2) fetch too 
small a subset of topic-partitions which will be the same case as 1).

Capping based on time is better since it provides "fairness" but that seems a 
little hacky.

My reasoning of decoupling socket and network processor is the following. As we 
scale up the principle should be "various clients are isolated from each 
other". As for fetch request it would be "if you request old data from many 
topic partitions only your self-request should take long time but other 
requests should not be impacted". Today a request's life time as on server is

socket -> network processor -> request handler -> (possible) disk I/O due to 
flush for produce request -> socket processor -> network I/O

and one way to enable isolation is that no pair of this path is 
single-threaded. Today socket -> network processor is via acceptor, network 
processor -> request handler is via request queue, request handler -> 
(possible) disk I/O due to flush for produce request is fixed in KAFKA-615; but 
socket processor -> network I/O is still coupled, and fixes to issues resulted 
by this coupling would be taking care of the "worst case", which does not obey 
the "isolation" principle. 

I agree this is rather complex and would be a long term thing.

> Time-consuming FetchRequest could block other request in the response queue
> ---
>
> Key: KAFKA-1043
> URL: https://issues.apache.org/jira/browse/KAFKA-1043
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.1
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
> Fix For: 0.8, 0.8.1
>
>
> Since in SocketServer the processor who takes any request is also responsible 
> for writing the response for that request, we make each processor owning its 
> own response queue. If a FetchRequest takes irregularly long time to write 
> the channel buffer it would block all other responses in the queue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Rebalancing failures during upgrade to latest code

2013-09-19 Thread Sam Meder
The latest consumer changes to read data from Zookeeper during rebalance have 
made the consumer rebalance code incompatible with older versions (making 
rolling upgrades without downtime hard). The problem relates to how partitions 
are ordered. The old code seems to have returned the partitions sorted:

... rebalancing the following partitions: ArrayBuffer(0, 1, 2, 3, 4, 5, 6, 7, 
8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19) for topic produce-indexable-views 
with consumers: ...

the new code instead uses:

... rebalancing the following partitions: List(0, 5, 10, 14, 1, 6, 9, 13, 2, 
17, 12, 7, 3, 18, 16, 11, 8, 19, 4, 15) for topic produce-indexable-views with 
consumers: ...

This causes new consumers and old consumers to claim the same partitions. I 
realize that this may not be a big deal (although painful for us since it 
disagrees with our deployment automation) since the code wasn't officially 
released, but it seems simple enough to sort the partitions if you'd take such 
a patch.

/Sam





Re: preparing for the 0.8 final release

2013-09-19 Thread Neha Narkhede
Based on the list above, we may be able to clear up the remaining jiras in
roughly 3 weeks, so we can plan for a final release in a month or so. We
would appreciate contributions and patches to close out the remaining jiras.

Thanks
Neha


On Thu, Sep 19, 2013 at 4:15 AM, Haithem Jarraya wrote:

> Hi Jun,
>
> any updates on this, there is a final release is planned soon?
>
> Thanks,
>
> Haithem
>
> On 13 Sep 2013, at 17:18, Jun Rao  wrote:
>
> > Hi, Everyone,
> >
> > We have been stabilizing the 0.8 branch since the beta1 release. I think
> we
> > are getting close to an 0.8 final release. I made an initial list of the
> > remaining jiras that should be fixed in 0.8.
> >
> >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20KAFKA%20AND%20fixVersion%20%3D%20%220.8%22%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened%2C%20%22Patch%20Available%22)
> >
> > 1. Do people agree with the list?
> >
> > 2. If the list is good, could people help contributing/reviewing the
> > remaining jiras?
> >
> > Thanks,
> >
> > Jun
>
>


Re: Rebalancing failures during upgrade to latest code

2013-09-19 Thread Neha Narkhede
Agreed. This is a regression and is not easy to reason about. This is a
side effect of reading the partitions as a set from zookeeper. Please can
you file a JIRA to get this fixed? Feel free to upload a patch as well.

Thanks,
Neha


On Thu, Sep 19, 2013 at 8:17 AM, Sam Meder wrote:

> The latest consumer changes to read data from Zookeeper during rebalance
> have made the consumer rebalance code incompatible with older versions
> (making rolling upgrades without downtime hard). The problem relates to how
> partitions are ordered. The old code seems to have returned the partitions
> sorted:
>
> ... rebalancing the following partitions: ArrayBuffer(0, 1, 2, 3, 4, 5, 6,
> 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19) for topic
> produce-indexable-views with consumers: ...
>
> the new code instead uses:
>
> ... rebalancing the following partitions: List(0, 5, 10, 14, 1, 6, 9, 13,
> 2, 17, 12, 7, 3, 18, 16, 11, 8, 19, 4, 15) for topic
> produce-indexable-views with consumers: ...
>
> This causes new consumers and old consumers to claim the same partitions.
> I realize that this may not be a big deal (although painful for us since it
> disagrees with our deployment automation) since the code wasn't officially
> released, but it seems simple enough to sort the partitions if you'd take
> such a patch.
>
> /Sam
>
>
>
>


Re: Rebalancing failures during upgrade to latest code

2013-09-19 Thread Guozhang Wang
Hello Sam,

I agree that with the fix we still should sort the partition list before
hand it to the assignment algorithm. I will try to make a follow-up patch
to fix this.

Guozhang


On Thu, Sep 19, 2013 at 8:17 AM, Sam Meder wrote:

> The latest consumer changes to read data from Zookeeper during rebalance
> have made the consumer rebalance code incompatible with older versions
> (making rolling upgrades without downtime hard). The problem relates to how
> partitions are ordered. The old code seems to have returned the partitions
> sorted:
>
> ... rebalancing the following partitions: ArrayBuffer(0, 1, 2, 3, 4, 5, 6,
> 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19) for topic
> produce-indexable-views with consumers: ...
>
> the new code instead uses:
>
> ... rebalancing the following partitions: List(0, 5, 10, 14, 1, 6, 9, 13,
> 2, 17, 12, 7, 3, 18, 16, 11, 8, 19, 4, 15) for topic
> produce-indexable-views with consumers: ...
>
> This causes new consumers and old consumers to claim the same partitions.
> I realize that this may not be a big deal (although painful for us since it
> disagrees with our deployment automation) since the code wasn't officially
> released, but it seems simple enough to sort the partitions if you'd take
> such a patch.
>
> /Sam
>
>
>
>


-- 
-- Guozhang


[jira] Subscription: outstanding kafka patches

2013-09-19 Thread jira
Issue Subscription
Filter: outstanding kafka patches (68 issues)
The list of outstanding kafka patches
Subscriber: kafka-mailing-list

Key Summary
KAFKA-1049  Encoder implementations are required to provide an undocumented 
constructor.
https://issues.apache.org/jira/browse/KAFKA-1049
KAFKA-1042  Fix segment flush logic
https://issues.apache.org/jira/browse/KAFKA-1042
KAFKA-1032  Messages sent to the old leader will be lost on broker GC resulted 
failure
https://issues.apache.org/jira/browse/KAFKA-1032
KAFKA-1020  Remove getAllReplicasOnBroker from KafkaController
https://issues.apache.org/jira/browse/KAFKA-1020
KAFKA-1012  Implement an Offset Manager and hook offset requests to it
https://issues.apache.org/jira/browse/KAFKA-1012
KAFKA-1011  Decompression and re-compression on MirrorMaker could result in 
messages being dropped in the pipeline
https://issues.apache.org/jira/browse/KAFKA-1011
KAFKA-1008  Unmap before resizing
https://issues.apache.org/jira/browse/KAFKA-1008
KAFKA-1005  kafka.perf.ConsumerPerformance not shutting down consumer
https://issues.apache.org/jira/browse/KAFKA-1005
KAFKA-1004  Handle topic event for trivial whitelist topic filters
https://issues.apache.org/jira/browse/KAFKA-1004
KAFKA-998   Producer should not retry on non-recoverable error codes
https://issues.apache.org/jira/browse/KAFKA-998
KAFKA-997   Provide a strict verification mode when reading configuration 
properties
https://issues.apache.org/jira/browse/KAFKA-997
KAFKA-996   Capitalize first letter for log entries
https://issues.apache.org/jira/browse/KAFKA-996
KAFKA-984   Avoid a full rebalance in cases when a new topic is discovered but 
container/broker set stay the same
https://issues.apache.org/jira/browse/KAFKA-984
KAFKA-982   Logo for Kafka
https://issues.apache.org/jira/browse/KAFKA-982
KAFKA-981   Unable to pull Kafka binaries with Ivy
https://issues.apache.org/jira/browse/KAFKA-981
KAFKA-976   Order-Preserving Mirror Maker Testcase
https://issues.apache.org/jira/browse/KAFKA-976
KAFKA-967   Use key range in ProducerPerformance
https://issues.apache.org/jira/browse/KAFKA-967
KAFKA-956   High-level consumer fails to check topic metadata response for 
errors
https://issues.apache.org/jira/browse/KAFKA-956
KAFKA-946   Kafka Hadoop Consumer fails when verifying message checksum
https://issues.apache.org/jira/browse/KAFKA-946
KAFKA-917   Expose zk.session.timeout.ms in console consumer
https://issues.apache.org/jira/browse/KAFKA-917
KAFKA-885   sbt package builds two kafka jars
https://issues.apache.org/jira/browse/KAFKA-885
KAFKA-881   Kafka broker not respecting log.roll.hours
https://issues.apache.org/jira/browse/KAFKA-881
KAFKA-873   Consider replacing zkclient with curator (with zkclient-bridge)
https://issues.apache.org/jira/browse/KAFKA-873
KAFKA-868   System Test - add test case for rolling controlled shutdown
https://issues.apache.org/jira/browse/KAFKA-868
KAFKA-863   System Test - update 0.7 version of kafka-run-class.sh for 
Migration Tool test cases
https://issues.apache.org/jira/browse/KAFKA-863
KAFKA-859   support basic auth protection of mx4j console
https://issues.apache.org/jira/browse/KAFKA-859
KAFKA-855   Ant+Ivy build for Kafka
https://issues.apache.org/jira/browse/KAFKA-855
KAFKA-854   Upgrade dependencies for 0.8
https://issues.apache.org/jira/browse/KAFKA-854
KAFKA-815   Improve SimpleConsumerShell to take in a max messages config option
https://issues.apache.org/jira/browse/KAFKA-815
KAFKA-745   Remove getShutdownReceive() and other kafka specific code from the 
RequestChannel
https://issues.apache.org/jira/browse/KAFKA-745
KAFKA-735   Add looping and JSON output for ConsumerOffsetChecker
https://issues.apache.org/jira/browse/KAFKA-735
KAFKA-717   scala 2.10 build support
https://issues.apache.org/jira/browse/KAFKA-717
KAFKA-686   0.8 Kafka broker should give a better error message when running 
against 0.7 zookeeper
https://issues.apache.org/jira/browse/KAFKA-686
KAFKA-674   Clean Shutdown Testing - Log segments checksums mismatch
https://issues.apache.org/jira/browse/KAFKA-674
KAFKA-652   Create testcases for clean shut-down
https://issues.apache.org/jira/browse/KAFKA-652
KAFKA-649   Cleanup log4j logging
https://issues.apache.org/jira/browse/KAFKA-649
KAFKA-645   Create a shell script to run System Test with DEBUG details and 
"tee" console output to a file
https://issues.apache.org/jira/browse/KAFKA-645
KAFKA-598   decouple fetch size from max message size
https://issues.apache.org/jira/browse/KAFKA-598
KAFKA-583   SimpleCons

[jira] [Commented] (KAFKA-1043) Time-consuming FetchRequest could block other request in the response queue

2013-09-19 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771994#comment-13771994
 ] 

Neha Narkhede commented on KAFKA-1043:
--

As Sriram said, we no longer block on a full socket buffer. The problem is 
really large fetch requests, like those coming from a lagging mirror maker, 
hogging the network thread by writing as much as possible while the socket 
buffer is not full. This basically increases the response send time for all 
other requests whose responses are queued up behind this large fetch request. 
This causes a downward spiral that takes quite some time to recover due to the 
filled up response queues. 

We could cap based on size where we yield the network thread after n MBs are 
written on the wire, giving a chance for the rest of the smaller responses to 
get written on the socket. This will ensure a large or a few large fetch 
requests don't penalize several other smaller requests.

> Time-consuming FetchRequest could block other request in the response queue
> ---
>
> Key: KAFKA-1043
> URL: https://issues.apache.org/jira/browse/KAFKA-1043
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.1
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
> Fix For: 0.8, 0.8.1
>
>
> Since in SocketServer the processor who takes any request is also responsible 
> for writing the response for that request, we make each processor owning its 
> own response queue. If a FetchRequest takes irregularly long time to write 
> the channel buffer it would block all other responses in the queue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (KAFKA-1061) Break-down sendTime to multipleSendTime

2013-09-19 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-1061:
-

Fix Version/s: 0.8.1

> Break-down sendTime to multipleSendTime
> ---
>
> Key: KAFKA-1061
> URL: https://issues.apache.org/jira/browse/KAFKA-1061
> Project: Kafka
>  Issue Type: Bug
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
> Fix For: 0.8.1
>
>
> After KAFKA-1060 is done we would also like to break the sendTime to each 
> MultiSend's time and its corresponding send data size.
> This is related to KAFKA-1043

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (KAFKA-1060) Break-down sendTime into responseQueueTime and the real sendTime

2013-09-19 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-1060:
-

Fix Version/s: 0.8.1

> Break-down sendTime into responseQueueTime and the real sendTime
> 
>
> Key: KAFKA-1060
> URL: https://issues.apache.org/jira/browse/KAFKA-1060
> Project: Kafka
>  Issue Type: Bug
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
> Fix For: 0.8.1
>
>
> Currently the responseSendTime in updateRequestMetrics actually contains two 
> portions, the responseQueueTime and the real SendTime. We would like to 
> distinguish these two cases.
> This is related to KAFKA-1043

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (KAFKA-1060) Break-down sendTime into responseQueueTime and the real sendTime

2013-09-19 Thread Guozhang Wang (JIRA)
Guozhang Wang created KAFKA-1060:


 Summary: Break-down sendTime into responseQueueTime and the real 
sendTime
 Key: KAFKA-1060
 URL: https://issues.apache.org/jira/browse/KAFKA-1060
 Project: Kafka
  Issue Type: Bug
Reporter: Guozhang Wang
Assignee: Guozhang Wang


Currently the responseSendTime in updateRequestMetrics actually contains two 
portions, the responseQueueTime and the real SendTime. We would like to 
distinguish these two cases

This is related to KAFKA-1043

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (KAFKA-1061) Break-down sendTime to multipleSendTime

2013-09-19 Thread Guozhang Wang (JIRA)
Guozhang Wang created KAFKA-1061:


 Summary: Break-down sendTime to multipleSendTime
 Key: KAFKA-1061
 URL: https://issues.apache.org/jira/browse/KAFKA-1061
 Project: Kafka
  Issue Type: Bug
Reporter: Guozhang Wang
Assignee: Guozhang Wang


After KAFKA-1060 is done we would also like to break the sendTime to each 
MultiSend's time and its corresponding send data size.

This is related to KAFKA-1043

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (KAFKA-1060) Break-down sendTime into responseQueueTime and the real sendTime

2013-09-19 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-1060:
-

Description: 
Currently the responseSendTime in updateRequestMetrics actually contains two 
portions, the responseQueueTime and the real SendTime. We would like to 
distinguish these two cases.

This is related to KAFKA-1043

  was:
Currently the responseSendTime in updateRequestMetrics actually contains two 
portions, the responseQueueTime and the real SendTime. We would like to 
distinguish these two cases

This is related to KAFKA-1043


> Break-down sendTime into responseQueueTime and the real sendTime
> 
>
> Key: KAFKA-1060
> URL: https://issues.apache.org/jira/browse/KAFKA-1060
> Project: Kafka
>  Issue Type: Bug
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
>
> Currently the responseSendTime in updateRequestMetrics actually contains two 
> portions, the responseQueueTime and the real SendTime. We would like to 
> distinguish these two cases.
> This is related to KAFKA-1043

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (KAFKA-1018) tidy up the POM from what feedback has come from the 0.8 beta and publishing to maven

2013-09-19 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13772450#comment-13772450
 ] 

Neha Narkhede commented on KAFKA-1018:
--

[~joestein] This is marked for the 0.8 final release. Do you think you could 
help look into this?

> tidy up the POM from what feedback has come from the 0.8 beta and publishing 
> to maven
> -
>
> Key: KAFKA-1018
> URL: https://issues.apache.org/jira/browse/KAFKA-1018
> Project: Kafka
>  Issue Type: Bug
>Reporter: Joe Stein
> Fix For: 0.8
>
>
> from Chris Riccomini 
> 1. Maven central can't resolve it properly (POM is different from Apache 
> release). Have to use Apache release repo directly to get things to work.
> 2. Exclusions must be manually applied even though they exist in Kafka's POM 
> already. I think Maven can handle this automatically, if the POM is done 
> right.
> 3. Weird parent block in Kafka POMs that points to org.apache.
> 4. Would be nice to publish kafka-test jars as well.
> 5. Would be nice to have SNAPSHOT releases off of trunk using a Hudson job.
> Our hypothesis regarding the first issue is that it was caused by duplicate 
> publishing during testing, and it should go away in the future.
> Regarding number 2, I have to explicitly exclude the following when depending 
> on Kafka:
> exclude module: 'jms'
> exclude module: 'jmxtools'
> exclude module: 'jmxri'
> I believe these just need to be excluded from the appropriate jars in the 
> actual SBT build file, to fix this issue. I see JMS is excluded from ZK, but 
> it's probably being pulled in from somewhere else, anyway.
> Regarding number 3, it is indeed listed as something to do on the Apache 
> publication page (http://www.apache.org/dev/publishing-maven-artifacts.html). 
> I can't find an example of anyone using it, but it doesn't seem to be doing 
> any harm.
> Also, regarding your intransitive() call, that is disabling ALL dependencies 
> not just the exclusions, I believe. I think that the "proper" way to do that 
> would be to do what I've done: exclude("jms", "jmxtools", "jmxri"). 
> Regardless, fixing number 2, above, should mean that intransitive()/exclude() 
> are not required.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (KAFKA-956) High-level consumer fails to check topic metadata response for errors

2013-09-19 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13772452#comment-13772452
 ] 

Neha Narkhede commented on KAFKA-956:
-

[~smeder] KAFKA-1030 is now checked in and the problem reported in this JIRA 
should be fixed. Can you confirm that and close this JIRA?

> High-level consumer fails to check topic metadata response for errors
> -
>
> Key: KAFKA-956
> URL: https://issues.apache.org/jira/browse/KAFKA-956
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.8
>Reporter: Sam Meder
>Assignee: Neha Narkhede
>Priority: Blocker
> Fix For: 0.8
>
> Attachments: consumer_metadata_fetch.patch
>
>
> In our environment we noticed that consumers would sometimes hang when 
> started too close to starting the Kafka server. I tracked this down and it 
> seems to be related to some code in rebalance 
> (ZookeeperConsumerConnector.scala). In particular the following code seems 
> problematic:
>   val topicsMetadata = 
> ClientUtils.fetchTopicMetadata(myTopicThreadIdsMap.keySet,
>   brokers,
>   config.clientId,
>   
> config.socketTimeoutMs,
>   
> correlationId.getAndIncrement).topicsMetadata
>   val partitionsPerTopicMap = new mutable.HashMap[String, Seq[Int]]
>   topicsMetadata.foreach(m => {
> val topic = m.topic
> val partitions = m.partitionsMetadata.map(m1 => m1.partitionId)
> partitionsPerTopicMap.put(topic, partitions)
>   })
> The response is never checked for error, so may not actually contain any 
> partition info! Rebalance goes its merry way, but doesn't know about any 
> partitions so never assigns them...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (KAFKA-1008) Unmap before resizing

2013-09-19 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13772459#comment-13772459
 ] 

Neha Narkhede commented on KAFKA-1008:
--

ping [~lizziew], [~jkreps]. Could you address Jun's review comments and see if 
we can resolve this JIRA? This is marked for the 0.8 final release

> Unmap before resizing
> -
>
> Key: KAFKA-1008
> URL: https://issues.apache.org/jira/browse/KAFKA-1008
> Project: Kafka
>  Issue Type: Bug
>  Components: core, log
>Affects Versions: 0.8
> Environment: Windows, Linux, Mac OS
>Reporter: Elizabeth Wei
>Assignee: Jay Kreps
>  Labels: patch
> Fix For: 0.8
>
> Attachments: KAFKA-0.8-1008-v7.patch, KAFKA-1008-v6.patch, 
> KAFKA-trunk-1008-v7.patch, unmap-v5.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> While I was studying how MappedByteBuffer works, I saw a sharing runtime 
> exception on Windows. I applied what I learned to generate a patch which uses 
> an internal open JDK API to solve this problem.
> Following Jay's advice, I made a helper method called tryUnmap(). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (KAFKA-956) High-level consumer fails to check topic metadata response for errors

2013-09-19 Thread Sam Meder (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Meder updated KAFKA-956:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

I can confirm that this issue was fixed by KAFKA-1030

> High-level consumer fails to check topic metadata response for errors
> -
>
> Key: KAFKA-956
> URL: https://issues.apache.org/jira/browse/KAFKA-956
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.8
>Reporter: Sam Meder
>Assignee: Neha Narkhede
>Priority: Blocker
> Fix For: 0.8
>
> Attachments: consumer_metadata_fetch.patch
>
>
> In our environment we noticed that consumers would sometimes hang when 
> started too close to starting the Kafka server. I tracked this down and it 
> seems to be related to some code in rebalance 
> (ZookeeperConsumerConnector.scala). In particular the following code seems 
> problematic:
>   val topicsMetadata = 
> ClientUtils.fetchTopicMetadata(myTopicThreadIdsMap.keySet,
>   brokers,
>   config.clientId,
>   
> config.socketTimeoutMs,
>   
> correlationId.getAndIncrement).topicsMetadata
>   val partitionsPerTopicMap = new mutable.HashMap[String, Seq[Int]]
>   topicsMetadata.foreach(m => {
> val topic = m.topic
> val partitions = m.partitionsMetadata.map(m1 => m1.partitionId)
> partitionsPerTopicMap.put(topic, partitions)
>   })
> The response is never checked for error, so may not actually contain any 
> partition info! Rebalance goes its merry way, but doesn't know about any 
> partitions so never assigns them...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Closed] (KAFKA-956) High-level consumer fails to check topic metadata response for errors

2013-09-19 Thread Sam Meder (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Meder closed KAFKA-956.
---


> High-level consumer fails to check topic metadata response for errors
> -
>
> Key: KAFKA-956
> URL: https://issues.apache.org/jira/browse/KAFKA-956
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.8
>Reporter: Sam Meder
>Assignee: Neha Narkhede
>Priority: Blocker
> Fix For: 0.8
>
> Attachments: consumer_metadata_fetch.patch
>
>
> In our environment we noticed that consumers would sometimes hang when 
> started too close to starting the Kafka server. I tracked this down and it 
> seems to be related to some code in rebalance 
> (ZookeeperConsumerConnector.scala). In particular the following code seems 
> problematic:
>   val topicsMetadata = 
> ClientUtils.fetchTopicMetadata(myTopicThreadIdsMap.keySet,
>   brokers,
>   config.clientId,
>   
> config.socketTimeoutMs,
>   
> correlationId.getAndIncrement).topicsMetadata
>   val partitionsPerTopicMap = new mutable.HashMap[String, Seq[Int]]
>   topicsMetadata.foreach(m => {
> val topic = m.topic
> val partitions = m.partitionsMetadata.map(m1 => m1.partitionId)
> partitionsPerTopicMap.put(topic, partitions)
>   })
> The response is never checked for error, so may not actually contain any 
> partition info! Rebalance goes its merry way, but doesn't know about any 
> partitions so never assigns them...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (KAFKA-1062) Reading topic metadata from zookeeper leads to incompatible ordering of partition list

2013-09-19 Thread Sam Meder (JIRA)
Sam Meder created KAFKA-1062:


 Summary: Reading topic metadata from zookeeper leads to 
incompatible ordering of partition list
 Key: KAFKA-1062
 URL: https://issues.apache.org/jira/browse/KAFKA-1062
 Project: Kafka
  Issue Type: Bug
  Components: consumer
Affects Versions: 0.8
Reporter: Sam Meder
Assignee: Neha Narkhede


The latest consumer changes to read data from Zookeeper during rebalance have 
made the consumer rebalance code incompatible with older versions (making 
rolling upgrades without downtime hard). The problem relates to how partitions 
are ordered. The old code seems to have returned the partitions sorted:

... rebalancing the following partitions: ArrayBuffer(0, 1, 2, 3, 4, 5, 6, 7, 
8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19) for topic produce-indexable-views 
with consumers: ...

the new code instead uses:

... rebalancing the following partitions: List(0, 5, 10, 14, 1, 6, 9, 13, 2, 
17, 12, 7, 3, 18, 16, 11, 8, 19, 4, 15) for topic produce-indexable-views with 
consumers: ...

This causes new consumers and old consumers to claim the same partitions. I 
realize that this may not be a big deal (although painful for us since it 
disagrees with our deployment automation) since the code wasn't officially 
released, but it seems simple enough to sort the partitions if you'd take such 
a patch.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (KAFKA-1062) Reading topic metadata from zookeeper leads to incompatible ordering of partition list

2013-09-19 Thread Sam Meder (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Meder updated KAFKA-1062:
-

Attachment: sorted.patch

> Reading topic metadata from zookeeper leads to incompatible ordering of 
> partition list
> --
>
> Key: KAFKA-1062
> URL: https://issues.apache.org/jira/browse/KAFKA-1062
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.8
>Reporter: Sam Meder
>Assignee: Neha Narkhede
> Attachments: sorted.patch
>
>
> The latest consumer changes to read data from Zookeeper during rebalance have 
> made the consumer rebalance code incompatible with older versions (making 
> rolling upgrades without downtime hard). The problem relates to how 
> partitions are ordered. The old code seems to have returned the partitions 
> sorted:
> ... rebalancing the following partitions: ArrayBuffer(0, 1, 2, 3, 4, 5, 6, 7, 
> 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19) for topic 
> produce-indexable-views with consumers: ...
> the new code instead uses:
> ... rebalancing the following partitions: List(0, 5, 10, 14, 1, 6, 9, 13, 2, 
> 17, 12, 7, 3, 18, 16, 11, 8, 19, 4, 15) for topic produce-indexable-views 
> with consumers: ...
> This causes new consumers and old consumers to claim the same partitions. I 
> realize that this may not be a big deal (although painful for us since it 
> disagrees with our deployment automation) since the code wasn't officially 
> released, but it seems simple enough to sort the partitions if you'd take 
> such a patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (KAFKA-1062) Reading topic metadata from zookeeper leads to incompatible ordering of partition list

2013-09-19 Thread Sam Meder (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Meder updated KAFKA-1062:
-

Status: Patch Available  (was: Open)

> Reading topic metadata from zookeeper leads to incompatible ordering of 
> partition list
> --
>
> Key: KAFKA-1062
> URL: https://issues.apache.org/jira/browse/KAFKA-1062
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.8
>Reporter: Sam Meder
>Assignee: Neha Narkhede
> Attachments: sorted.patch
>
>
> The latest consumer changes to read data from Zookeeper during rebalance have 
> made the consumer rebalance code incompatible with older versions (making 
> rolling upgrades without downtime hard). The problem relates to how 
> partitions are ordered. The old code seems to have returned the partitions 
> sorted:
> ... rebalancing the following partitions: ArrayBuffer(0, 1, 2, 3, 4, 5, 6, 7, 
> 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19) for topic 
> produce-indexable-views with consumers: ...
> the new code instead uses:
> ... rebalancing the following partitions: List(0, 5, 10, 14, 1, 6, 9, 13, 2, 
> 17, 12, 7, 3, 18, 16, 11, 8, 19, 4, 15) for topic produce-indexable-views 
> with consumers: ...
> This causes new consumers and old consumers to claim the same partitions. I 
> realize that this may not be a big deal (although painful for us since it 
> disagrees with our deployment automation) since the code wasn't officially 
> released, but it seems simple enough to sort the partitions if you'd take 
> such a patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Rebalancing failures during upgrade to latest code

2013-09-19 Thread Sam Meder
Filed KAFKA-1062, including trivial patch.

/Sam

On Sep 19, 2013, at 5:52 PM, Neha Narkhede  wrote:

> Agreed. This is a regression and is not easy to reason about. This is a
> side effect of reading the partitions as a set from zookeeper. Please can
> you file a JIRA to get this fixed? Feel free to upload a patch as well.
> 
> Thanks,
> Neha
> 
> 
> On Thu, Sep 19, 2013 at 8:17 AM, Sam Meder wrote:
> 
>> The latest consumer changes to read data from Zookeeper during rebalance
>> have made the consumer rebalance code incompatible with older versions
>> (making rolling upgrades without downtime hard). The problem relates to how
>> partitions are ordered. The old code seems to have returned the partitions
>> sorted:
>> 
>> ... rebalancing the following partitions: ArrayBuffer(0, 1, 2, 3, 4, 5, 6,
>> 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19) for topic
>> produce-indexable-views with consumers: ...
>> 
>> the new code instead uses:
>> 
>> ... rebalancing the following partitions: List(0, 5, 10, 14, 1, 6, 9, 13,
>> 2, 17, 12, 7, 3, 18, 16, 11, 8, 19, 4, 15) for topic
>> produce-indexable-views with consumers: ...
>> 
>> This causes new consumers and old consumers to claim the same partitions.
>> I realize that this may not be a big deal (although painful for us since it
>> disagrees with our deployment automation) since the code wasn't officially
>> released, but it seems simple enough to sort the partitions if you'd take
>> such a patch.
>> 
>> /Sam
>> 
>> 
>> 
>> 



[jira] [Commented] (KAFKA-1062) Reading topic metadata from zookeeper leads to incompatible ordering of partition list

2013-09-19 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13772634#comment-13772634
 ] 

Guozhang Wang commented on KAFKA-1062:
--

+1. Thanks for the patch.

Guozhang

> Reading topic metadata from zookeeper leads to incompatible ordering of 
> partition list
> --
>
> Key: KAFKA-1062
> URL: https://issues.apache.org/jira/browse/KAFKA-1062
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.8
>Reporter: Sam Meder
>Assignee: Neha Narkhede
> Attachments: sorted.patch
>
>
> The latest consumer changes to read data from Zookeeper during rebalance have 
> made the consumer rebalance code incompatible with older versions (making 
> rolling upgrades without downtime hard). The problem relates to how 
> partitions are ordered. The old code seems to have returned the partitions 
> sorted:
> ... rebalancing the following partitions: ArrayBuffer(0, 1, 2, 3, 4, 5, 6, 7, 
> 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19) for topic 
> produce-indexable-views with consumers: ...
> the new code instead uses:
> ... rebalancing the following partitions: List(0, 5, 10, 14, 1, 6, 9, 13, 2, 
> 17, 12, 7, 3, 18, 16, 11, 8, 19, 4, 15) for topic produce-indexable-views 
> with consumers: ...
> This causes new consumers and old consumers to claim the same partitions. I 
> realize that this may not be a big deal (although painful for us since it 
> disagrees with our deployment automation) since the code wasn't officially 
> released, but it seems simple enough to sort the partitions if you'd take 
> such a patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira