[jira] [Commented] (KAFKA-1282) Disconnect idle socket connection in Selector

2014-09-17 Thread nicu marasoiu (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137011#comment-14137011
 ] 

nicu marasoiu commented on KAFKA-1282:
--

Hi,

Unfortunately the client used in console-producer is not robust with respect to 
disconnections, as below. Is this the "old" scala producer, and can we hope for 
a resilient behaviour that I can test with the new java producer?

[2014-09-17 12:44:12,009] WARN Failed to send producer request with correlation 
id 15 to broker 0 with data for partitions [topi,0] 
(kafka.producer.async.DefaultEventHandler)
java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcherImpl.writev0(Native Method)
at sun.nio.ch.SocketDispatcher.writev(SocketDispatcher.java:51)
at sun.nio.ch.IOUtil.write(IOUtil.java:149)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:483)
at java.nio.channels.SocketChannel.write(SocketChannel.java:493)
at 
kafka.network.BoundedByteBufferSend.writeTo(BoundedByteBufferSend.scala:56)
at kafka.network.Send$class.writeCompletely(Transmission.scala:75)
at 
kafka.network.BoundedByteBufferSend.writeCompletely(BoundedByteBufferSend.scala:26)
at kafka.network.BlockingChannel.send(BlockingChannel.scala:100)


> Disconnect idle socket connection in Selector
> -
>
> Key: KAFKA-1282
> URL: https://issues.apache.org/jira/browse/KAFKA-1282
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.2
>Reporter: Jun Rao
>Assignee: nicu marasoiu
>  Labels: newbie++
> Fix For: 0.9.0
>
> Attachments: 1282_the_ultimate,_close_max_one_per_select.patch, 
> KAFKA-1282_Disconnect_idle_socket_connection_in_Selector.patch, 
> idleDisconnect.patch
>
>
> To reduce # socket connections, it would be useful for the new producer to 
> close socket connections that are idle. We can introduce a new producer 
> config for the idle time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-1282) Disconnect idle socket connection in Selector

2014-09-17 Thread nicu marasoiu (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137011#comment-14137011
 ] 

nicu marasoiu edited comment on KAFKA-1282 at 9/17/14 9:53 AM:
---

Hi,

Unfortunately the client used in console-producer is not very robust with 
respect to disconnections, as will detail below. Is this the "old" scala 
producer, and can we hope for a resilient behaviour that I can test with the 
new java producer?

More specifically, the connection is closed from the broker side, but the 
producer is unaware of this. The first message after the close is lost (and is 
not retried later). The second message sees the broken channel, outputs the 
exception below, and reconnects and is succesfully retried, I can see it 
consumed.

[2014-09-17 12:44:12,009] WARN Failed to send producer request with correlation 
id 15 to broker 0 with data for partitions [topi,0] 
(kafka.producer.async.DefaultEventHandler)
java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcherImpl.writev0(Native Method)
at sun.nio.ch.SocketDispatcher.writev(SocketDispatcher.java:51)
at sun.nio.ch.IOUtil.write(IOUtil.java:149)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:483)
at java.nio.channels.SocketChannel.write(SocketChannel.java:493)
at 
kafka.network.BoundedByteBufferSend.writeTo(BoundedByteBufferSend.scala:56)
at kafka.network.Send$class.writeCompletely(Transmission.scala:75)
at 
kafka.network.BoundedByteBufferSend.writeCompletely(BoundedByteBufferSend.scala:26)
at kafka.network.BlockingChannel.send(BlockingChannel.scala:100)



was (Author: nmarasoi):
Hi,

Unfortunately the client used in console-producer is not robust with respect to 
disconnections, as below. Is this the "old" scala producer, and can we hope for 
a resilient behaviour that I can test with the new java producer?

[2014-09-17 12:44:12,009] WARN Failed to send producer request with correlation 
id 15 to broker 0 with data for partitions [topi,0] 
(kafka.producer.async.DefaultEventHandler)
java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcherImpl.writev0(Native Method)
at sun.nio.ch.SocketDispatcher.writev(SocketDispatcher.java:51)
at sun.nio.ch.IOUtil.write(IOUtil.java:149)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:483)
at java.nio.channels.SocketChannel.write(SocketChannel.java:493)
at 
kafka.network.BoundedByteBufferSend.writeTo(BoundedByteBufferSend.scala:56)
at kafka.network.Send$class.writeCompletely(Transmission.scala:75)
at 
kafka.network.BoundedByteBufferSend.writeCompletely(BoundedByteBufferSend.scala:26)
at kafka.network.BlockingChannel.send(BlockingChannel.scala:100)


> Disconnect idle socket connection in Selector
> -
>
> Key: KAFKA-1282
> URL: https://issues.apache.org/jira/browse/KAFKA-1282
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.2
>Reporter: Jun Rao
>Assignee: nicu marasoiu
>  Labels: newbie++
> Fix For: 0.9.0
>
> Attachments: 1282_the_ultimate,_close_max_one_per_select.patch, 
> KAFKA-1282_Disconnect_idle_socket_connection_in_Selector.patch, 
> idleDisconnect.patch
>
>
> To reduce # socket connections, it would be useful for the new producer to 
> close socket connections that are idle. We can introduce a new producer 
> config for the idle time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1282) Disconnect idle socket connection in Selector

2014-09-17 Thread nicu marasoiu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nicu marasoiu updated KAFKA-1282:
-
Attachment: 1282_brushed_up.patch

re-attached fixed patch, but we may have a blocker to the whole solution on the 
broker side, pls see comment above/below (first message after disconnect is 
lost on the client used in console-prod)

> Disconnect idle socket connection in Selector
> -
>
> Key: KAFKA-1282
> URL: https://issues.apache.org/jira/browse/KAFKA-1282
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.2
>Reporter: Jun Rao
>Assignee: nicu marasoiu
>  Labels: newbie++
> Fix For: 0.9.0
>
> Attachments: 1282_brushed_up.patch, 
> KAFKA-1282_Disconnect_idle_socket_connection_in_Selector.patch
>
>
> To reduce # socket connections, it would be useful for the new producer to 
> close socket connections that are idle. We can introduce a new producer 
> config for the idle time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1282) Disconnect idle socket connection in Selector

2014-09-17 Thread nicu marasoiu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nicu marasoiu updated KAFKA-1282:
-
Attachment: (was: 1282_the_ultimate,_close_max_one_per_select.patch)

> Disconnect idle socket connection in Selector
> -
>
> Key: KAFKA-1282
> URL: https://issues.apache.org/jira/browse/KAFKA-1282
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.2
>Reporter: Jun Rao
>Assignee: nicu marasoiu
>  Labels: newbie++
> Fix For: 0.9.0
>
> Attachments: 1282_brushed_up.patch, 
> KAFKA-1282_Disconnect_idle_socket_connection_in_Selector.patch
>
>
> To reduce # socket connections, it would be useful for the new producer to 
> close socket connections that are idle. We can introduce a new producer 
> config for the idle time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1282) Disconnect idle socket connection in Selector

2014-09-17 Thread nicu marasoiu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nicu marasoiu updated KAFKA-1282:
-
Attachment: (was: idleDisconnect.patch)

> Disconnect idle socket connection in Selector
> -
>
> Key: KAFKA-1282
> URL: https://issues.apache.org/jira/browse/KAFKA-1282
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.2
>Reporter: Jun Rao
>Assignee: nicu marasoiu
>  Labels: newbie++
> Fix For: 0.9.0
>
> Attachments: 1282_brushed_up.patch, 
> KAFKA-1282_Disconnect_idle_socket_connection_in_Selector.patch
>
>
> To reduce # socket connections, it would be useful for the new producer to 
> close socket connections that are idle. We can introduce a new producer 
> config for the idle time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1635) Java doc of makeLeaders in ReplicaManager is wrong

2014-09-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137044#comment-14137044
 ] 

ASF GitHub Bot commented on KAFKA-1635:
---

Github user LantaoJin closed the pull request at:

https://github.com/apache/kafka/pull/33


> Java doc of makeLeaders in ReplicaManager is wrong
> --
>
> Key: KAFKA-1635
> URL: https://issues.apache.org/jira/browse/KAFKA-1635
> Project: Kafka
>  Issue Type: Bug
>  Components: core, replication
>Reporter: Lantao Jin
>Assignee: Lantao Jin
>Priority: Trivial
>  Labels: doc, server
> Fix For: 0.8.2
>
> Attachments: kafka-1635-1.patch
>
>
> ReplicaManager have an incorrect java doc. The overview of function  
> makeLeaders() is the same as makeFollowers().
> Also see commit at 
> https://github.com/apache/kafka/commit/6739a8e601331ad07d9856dc351785351755a5d5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request: KAFKA-1635: Fixed incorrect java doc of makeLe...

2014-09-17 Thread LantaoJin
Github user LantaoJin closed the pull request at:

https://github.com/apache/kafka/pull/33


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-1282) Disconnect idle socket connection in Selector

2014-09-17 Thread nicu marasoiu (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137118#comment-14137118
 ] 

nicu marasoiu commented on KAFKA-1282:
--

here is a time line:

he -> produced
he -> consumed
[ wait beyond timeout here, connection got closed underneath by the other side]
[2014-09-17 15:02:28,689] INFO Got user-level KeeperException when processing 
sessionid:0x148837ce181 type:setData cxid:0x24 zxid:0xec txntype:-1 
reqpath:n/a Error Path:/consumers/console-consumer-87959/offsets/topi/0 
Error:KeeperErrorCode = NoNode for 
/consumers/console-consumer-87959/offsets/topi/0 
(org.apache.zookeeper.server.PrepRequestProcessor)
[2014-09-17 15:02:28,691] INFO Got user-level KeeperException when processing 
sessionid:0x148837ce181 type:create cxid:0x25 zxid:0xed txntype:-1 
reqpath:n/a Error Path:/consumers/console-consumer-87959/offsets 
Error:KeeperErrorCode = NoNode for /consumers/console-consumer-87959/offsets 
(org.apache.zookeeper.server.PrepRequestProcessor)
dd --> produce attempt (never retried, or never reached the broker 
or at least never reached the consumer)
[ many seconds wait, to see if the message is being retried, apparently not, 
even though the default retry is 3 times]
w --> new attempt (immediattely I see the message below with 
the stack trace, and reconnect + retry is instantly sucesfull)
[2014-09-17 15:03:12,599] WARN Failed to send producer request with correlation 
id 9 to broker 0 with data for partitions [topi,0] 
(kafka.producer.async.DefaultEventHandler)
java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcherImpl.writev0(Native Method)
at sun.nio.ch.SocketDispatcher.writev(SocketDispatcher.java:51)
at sun.nio.ch.IOUtil.write(IOUtil.java:149)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:483)
at java.nio.channels.SocketChannel.write(SocketChannel.java:493)
at 
kafka.network.BoundedByteBufferSend.writeTo(BoundedByteBufferSend.scala:56)
at kafka.network.Send$class.writeCompletely(Transmission.scala:75)
at 
kafka.network.BoundedByteBufferSend.writeCompletely(BoundedByteBufferSend.scala:26)
at kafka.network.BlockingChannel.send(BlockingChannel.scala:100)
at kafka.producer.SyncProducer.liftedTree1$1(SyncProducer.scala:72)
at 
kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:71)
at 
kafka.producer.SyncProducer$$anonfun$send$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SyncProducer.scala:102)
at 
kafka.producer.SyncProducer$$anonfun$send$1$$anonfun$apply$mcV$sp$1.apply(SyncProducer.scala:102)
at 
kafka.producer.SyncProducer$$anonfun$send$1$$anonfun$apply$mcV$sp$1.apply(SyncProducer.scala:102)
at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
at 
kafka.producer.SyncProducer$$anonfun$send$1.apply$mcV$sp(SyncProducer.scala:101)
at 
kafka.producer.SyncProducer$$anonfun$send$1.apply(SyncProducer.scala:101)
at 
kafka.producer.SyncProducer$$anonfun$send$1.apply(SyncProducer.scala:101)
at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
at kafka.producer.SyncProducer.send(SyncProducer.scala:100)
at 
kafka.producer.async.DefaultEventHandler.kafka$producer$async$DefaultEventHandler$$send(DefaultEventHandler.scala:255)
at 
kafka.producer.async.DefaultEventHandler$$anonfun$dispatchSerializedData$2.apply(DefaultEventHandler.scala:106)
at 
kafka.producer.async.DefaultEventHandler$$anonfun$dispatchSerializedData$2.apply(DefaultEventHandler.scala:100)
at 
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
at 
scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
at 
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
at 
kafka.producer.async.DefaultEventHandler.dispatchSerializedData(DefaultEventHandler.scala:100)
at 
kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:72)
at 
kafka.producer.async.ProducerSendThread.tryToHandle(ProducerSendThread.scala:104)
at 
kafka.producer.async.ProducerSendThread$$anonfun$processEvents$3.apply(ProducerSendThread.scala:87)
at 
kafka.producer.async.ProducerSendThread$$anonfun$processEvents$3.apply(ProducerSendThread.scala:67)
at scala.collection.immutable.Stream.foreach(Stream.scala:547)
at 
kafka.producer.async.ProducerSendThread.processEvents(ProducerSendThread.scala:66)
at 

[jira] [Commented] (KAFKA-1282) Disconnect idle socket connection in Selector

2014-09-17 Thread Nicolae Marasoiu (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137130#comment-14137130
 ] 

Nicolae Marasoiu commented on KAFKA-1282:
-

in fact, this is something that needs fixing in the producer(s) anyway, but the 
issue is with the currently deployed producers.
One of the main reasons to go with a broker side close of the idle connections 
was that it is easier to redeploy brokers then producers.
But if this is indeed a bug in the producer(s) as I reproduced, those producers 
would need redeploy.
So moving this to the producer side as a configuration may again be an option 
on the table.

> Disconnect idle socket connection in Selector
> -
>
> Key: KAFKA-1282
> URL: https://issues.apache.org/jira/browse/KAFKA-1282
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.2
>Reporter: Jun Rao
>Assignee: nicu marasoiu
>  Labels: newbie++
> Fix For: 0.9.0
>
> Attachments: 1282_brushed_up.patch, 
> KAFKA-1282_Disconnect_idle_socket_connection_in_Selector.patch
>
>
> To reduce # socket connections, it would be useful for the new producer to 
> close socket connections that are idle. We can introduce a new producer 
> config for the idle time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2014-09-17 Thread Nicolae Marasoiu (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137181#comment-14137181
 ] 

Nicolae Marasoiu commented on KAFKA-1461:
-

Hi,

So I guess in this block:
try {
  trace("Issuing to broker %d of fetch request %s".format(sourceBroker.id, 
fetchRequest))
  response = simpleConsumer.fetch(fetchRequest)
} catch {
  case t: Throwable =>
if (isRunning.get) {
  warn("Error in fetch %s. Possible cause: %s".format(fetchRequest, 
t.toString))
  partitionMapLock synchronized {
partitionsWithError ++= partitionMap.keys
  }
}
}
I should add a case for the specific scenario of connection 
timeout/refused/reset and introduce a backoff on that path?

> Replica fetcher thread does not implement any back-off behavior
> ---
>
> Key: KAFKA-1461
> URL: https://issues.apache.org/jira/browse/KAFKA-1461
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.1.1
>Reporter: Sam Meder
>Assignee: nicu marasoiu
>  Labels: newbie++
>
> The current replica fetcher thread will retry in a tight loop if any error 
> occurs during the fetch call. For example, we've seen cases where the fetch 
> continuously throws a connection refused exception leading to several replica 
> fetcher threads that spin in a pretty tight loop.
> To a much lesser degree this is also an issue in the consumer fetcher thread, 
> although the fact that erroring partitions are removed so a leader can be 
> re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1633) Data loss if broker is killed

2014-09-17 Thread gautham varada (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137220#comment-14137220
 ] 

gautham varada commented on KAFKA-1633:
---

jun, can pl you verify our findings in your env ? we are able to consistently 
reproduce this issue.

> Data loss if broker is killed
> -
>
> Key: KAFKA-1633
> URL: https://issues.apache.org/jira/browse/KAFKA-1633
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.1.1
> Environment: centos 6.3, open jdk 7
>Reporter: gautham varada
>Assignee: Jun Rao
>
> We have a 2 node kafka cluster, we experienced data loss when we did a kill 
> -9 on the brokers.  We also found a work around to prevent this loss.
> Replication factor :2, 4 partitions
> Steps to reproduce
> 1. Create a 2 node cluster with replication factor 2, num partitions 4
> 2. We used Jmeter to pump events
> 3. We used kafka web console to inspect the log size after the test
> During the test, we simultaneously killed the brokers using kill -9 and we 
> tallied the metrics reported by jmeter and the size we observed in the web 
> console, we lost tons of messages.
> We went back and set the Producer retry to 1 instead of the default 3 and 
> repeated the above tests and we did not loose a single message.
> We repeated the above tests with the Producer retry set to 3 and 1 with a 
> single broker and we observed data loss when the retry was 3 and no loss when 
> the retry was 1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: kafka 0.8.1: Producer.send() can block forever when a broker is down

2014-09-17 Thread Jonathan Weeks Gmail

The issue is that even with one down broker, the rest of the cluster is up, but 
unreachable from the producer client in this case, which defeats the high 
availability characteristics of clustering.

For any producer trying to use the service, it is "russian roulette" whether 
you will get meta-data back when asking for topic/partition data.

The ClientUtils code rightly iterates through the broker list looking for the 
metadata in random order, but if the first broker in the list is down, the 
others are never retried in a timely manner.

An example stacktrace shows the problem:

default-dispatcher-3" prio=5 tid=0x7fef131c6000 nid=0x5f03 runnable 
[0x0001146d2000]
  java.lang.Thread.State: RUNNABLE
at sun.nio.ch.Net.connect0(Native Method)
at sun.nio.ch.Net.connect(Net.java:465)
at sun.nio.ch.Net.connect(Net.java:457)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:670)
- locked <0x0007ad4c1b50> (a java.lang.Object)
- locked <0x0007ad4c1b70> (a java.lang.Object)
- locked <0x0007ad4c1b60> (a java.lang.Object)
at kafka.network.BlockingChannel.connect(BlockingChannel.scala:57)
- locked <0x0007ad3f3408> (a java.lang.Object)
at kafka.producer.SyncProducer.connect(SyncProducer.scala:141)
at kafka.producer.SyncProducer.getOrMakeConnection(SyncProducer.scala:156)
at 
kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:68)
- locked <0x0007ad3de648> (a java.lang.Object)
at kafka.producer.SyncProducer.send(SyncProducer.scala:112)
at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:53)
at kafka.producer.BrokerPartitionInfo.updateInfo(BrokerPartitionInfo.scala:82)
at 
kafka.producer.async.DefaultEventHandler$$anonfun$handle$1.apply$mcV$sp(DefaultEventHandler.scala:67)
at kafka.utils.Utils$.swallow(Utils.scala:167)
at kafka.utils.Logging$class.swallowError(Logging.scala:106)
at kafka.utils.Utils$.swallowError(Utils.scala:46)
at kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:67)
at kafka.producer.Producer.send(Producer.scala:76)

An eight minute timeout is a non-starter for a clustered (HA) service. One 
would expect the system to respect the request.timeout.ms config setting, which 
it does, unless a broker host is down and happens to be first in the shuffled 
list of brokers to try to get the metadata.

I believe this bug is also exacerbated by the fact that the meta data is 
(rightly) refreshed via the topic.metadata.refresh.interval.ms config setting, 
which defaults to every 10 minutes. AFAIK, this means that if a single broker 
is down, every new producer as well as every existing producer has a 
(1/clusterSize-1) chance of either not starting or hanging for a minimum of 8 
minutes, (assuming the tcp connection code times out), every 10 minutes (or 
whatever topic.metadata.refresh.interval.ms is set to), if I understand 
correctly.  

Initializing the SocketChannel in code that doesn't respect the 
request.timeout.ms setting logically defeats the spirit of the timeout setting 
as well makes as the iteration code in ClientUtils far less useful:

(from fetchTopicMetadata:)
val shuffledBrokers = Random.shuffle(brokers)
while(i < shuffledBrokers.size && !fetchMetaDataSucceeded) {
  val producer: SyncProducer = ProducerPool.createSyncProducer(producerConfig, 
shuffledBrokers(i))
  info("Fetching metadata from broker %s with correlation id %d for %d topic(s) 
%s".format(shuffledBrokers(i), correlationId, topics.size, topics))
  try {
topicMetadataResponse = producer.send(topicMetadataRequest)
  
Opening the connection with a timeout as Jack suggests seems far preferable to 
the current situation.

Best Regards,

-Jonathan


On Sep 16, 2014, at 10:08 PM, Jun Rao  wrote:
> Jack,
> 
> If the broker is down, channel.connect() should throw an IOException,
> instead of blocking forever. In your case, is the broker host down? In that
> case, the connect call will likely wait for the default tcp connection
> timeout, which is 8+ mins.
> 
> Thanks,
> 
> Jun
> 
> On Tue, Sep 16, 2014 at 5:43 PM, Jack Foy  wrote:
> 
>> We observe that when a broker is down, Producer.send() can get into a
>> state where it will block forever, even when using the async producer.
>> 
>> When a Producer first sends data, it fetches topic metadata from the
>> broker cluster. To do this, it shuffles the list of hosts in the cluster,
>> then iterates through the list querying each broker.
>> 
>> For each broker in the shuffled list, the Producer creates a SyncProducer
>> and invokes SyncProducer.send().
>> SyncProducer.send() creates a BlockingChannel and invokes
>> BlockingChannel.connect().
>> BlockingChannel.connect() retrieves a java.nio.channels.SocketChannel,
>> sets it to blocking mode, and invokes SocketChannel.connect(), passing the
>> current broker hostname.
>> 
>> If the first broker in the list is nonresponsive, SocketChannel.connect()
>> will wait forever.
>> 
>> I think the correct change is as follows:
>> 
>> diff --git

[jira] [Work started] (KAFKA-1622) project shouldn't require signing to build

2014-09-17 Thread Ivan Lyutov (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on KAFKA-1622 started by Ivan Lyutov.
--
> project shouldn't require signing to build
> --
>
> Key: KAFKA-1622
> URL: https://issues.apache.org/jira/browse/KAFKA-1622
> Project: Kafka
>  Issue Type: Bug
>Reporter: Joe Stein
>Assignee: Ivan Lyutov
>Priority: Blocker
>  Labels: build, newbie, packaging
> Fix For: 0.8.2
>
>
> we only need signing for uploadArchives that is it
> The project trunk failed to build due to some signing/license checks (the 
> diff I used to get things to build is here: 
> https://gist.github.com/dehora/7e3c0bd75bb2b5d87557)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1282) Disconnect idle socket connection in Selector

2014-09-17 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137399#comment-14137399
 ] 

Jun Rao commented on KAFKA-1282:


Interesting. The data loss may have to do with ack=0, which is the default in 
console producer. Could you try ack=1?

> Disconnect idle socket connection in Selector
> -
>
> Key: KAFKA-1282
> URL: https://issues.apache.org/jira/browse/KAFKA-1282
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.2
>Reporter: Jun Rao
>Assignee: nicu marasoiu
>  Labels: newbie++
> Fix For: 0.9.0
>
> Attachments: 1282_brushed_up.patch, 
> KAFKA-1282_Disconnect_idle_socket_connection_in_Selector.patch
>
>
> To reduce # socket connections, it would be useful for the new producer to 
> close socket connections that are idle. We can introduce a new producer 
> config for the idle time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1622) project shouldn't require signing to build

2014-09-17 Thread Ivan Lyutov (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137408#comment-14137408
 ] 

Ivan Lyutov commented on KAFKA-1622:


Created reviewboard https://reviews.apache.org/r/25738/diff/
 against branch apache/trunk

> project shouldn't require signing to build
> --
>
> Key: KAFKA-1622
> URL: https://issues.apache.org/jira/browse/KAFKA-1622
> Project: Kafka
>  Issue Type: Bug
>Reporter: Joe Stein
>Assignee: Ivan Lyutov
>Priority: Blocker
>  Labels: build, newbie, packaging
> Fix For: 0.8.2
>
> Attachments: KAFKA-1622.patch
>
>
> we only need signing for uploadArchives that is it
> The project trunk failed to build due to some signing/license checks (the 
> diff I used to get things to build is here: 
> https://gist.github.com/dehora/7e3c0bd75bb2b5d87557)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1633) Data loss if broker is killed

2014-09-17 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137405#comment-14137405
 ] 

Jun Rao commented on KAFKA-1633:


It sounds like that you killed both brokers at the same time? Then, you have 
the same failures as # replicas and there could be data loss.

> Data loss if broker is killed
> -
>
> Key: KAFKA-1633
> URL: https://issues.apache.org/jira/browse/KAFKA-1633
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.1.1
> Environment: centos 6.3, open jdk 7
>Reporter: gautham varada
>Assignee: Jun Rao
>
> We have a 2 node kafka cluster, we experienced data loss when we did a kill 
> -9 on the brokers.  We also found a work around to prevent this loss.
> Replication factor :2, 4 partitions
> Steps to reproduce
> 1. Create a 2 node cluster with replication factor 2, num partitions 4
> 2. We used Jmeter to pump events
> 3. We used kafka web console to inspect the log size after the test
> During the test, we simultaneously killed the brokers using kill -9 and we 
> tallied the metrics reported by jmeter and the size we observed in the web 
> console, we lost tons of messages.
> We went back and set the Producer retry to 1 instead of the default 3 and 
> repeated the above tests and we did not loose a single message.
> We repeated the above tests with the Producer retry set to 3 and 1 with a 
> single broker and we observed data loss when the retry was 3 and no loss when 
> the retry was 1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Review Request 25738: Made signing task to execute only when uploadArchives is called

2014-09-17 Thread Ivan Lyutov

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25738/
---

Review request for kafka.


Bugs: KAFKA-1622
https://issues.apache.org/jira/browse/KAFKA-1622


Repository: kafka


Description
---

KAFKA-1622 - Made signing task to execute only when uploadArchives is called.


Diffs
-

  build.gradle 74c8c8a9e2f4d9a651181d5337d5a8f07f0cb313 

Diff: https://reviews.apache.org/r/25738/diff/


Testing
---


Thanks,

Ivan Lyutov



[jira] [Updated] (KAFKA-1622) project shouldn't require signing to build

2014-09-17 Thread Ivan Lyutov (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Lyutov updated KAFKA-1622:
---
Attachment: KAFKA-1622.patch

> project shouldn't require signing to build
> --
>
> Key: KAFKA-1622
> URL: https://issues.apache.org/jira/browse/KAFKA-1622
> Project: Kafka
>  Issue Type: Bug
>Reporter: Joe Stein
>Assignee: Ivan Lyutov
>Priority: Blocker
>  Labels: build, newbie, packaging
> Fix For: 0.8.2
>
> Attachments: KAFKA-1622.patch
>
>
> we only need signing for uploadArchives that is it
> The project trunk failed to build due to some signing/license checks (the 
> diff I used to get things to build is here: 
> https://gist.github.com/dehora/7e3c0bd75bb2b5d87557)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2014-09-17 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137407#comment-14137407
 ] 

Guozhang Wang commented on KAFKA-1461:
--

I did not realize this ticket exist, and created the same one here 
(KAFKA-1629). It has some more detailed explanation of the issue though.

> Replica fetcher thread does not implement any back-off behavior
> ---
>
> Key: KAFKA-1461
> URL: https://issues.apache.org/jira/browse/KAFKA-1461
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.1.1
>Reporter: Sam Meder
>Assignee: nicu marasoiu
>  Labels: newbie++
>
> The current replica fetcher thread will retry in a tight loop if any error 
> occurs during the fetch call. For example, we've seen cases where the fetch 
> continuously throws a connection refused exception leading to several replica 
> fetcher threads that spin in a pretty tight loop.
> To a much lesser degree this is also an issue in the consumer fetcher thread, 
> although the fact that erroring partitions are removed so a leader can be 
> re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1622) project shouldn't require signing to build

2014-09-17 Thread Ivan Lyutov (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Lyutov updated KAFKA-1622:
---
Status: Patch Available  (was: In Progress)

> project shouldn't require signing to build
> --
>
> Key: KAFKA-1622
> URL: https://issues.apache.org/jira/browse/KAFKA-1622
> Project: Kafka
>  Issue Type: Bug
>Reporter: Joe Stein
>Assignee: Ivan Lyutov
>Priority: Blocker
>  Labels: build, newbie, packaging
> Fix For: 0.8.2
>
> Attachments: KAFKA-1622.patch
>
>
> we only need signing for uploadArchives that is it
> The project trunk failed to build due to some signing/license checks (the 
> diff I used to get things to build is here: 
> https://gist.github.com/dehora/7e3c0bd75bb2b5d87557)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-1629) Replica fetcher thread need to back off upon getting errors on partitions

2014-09-17 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-1629.
--
Resolution: Duplicate

> Replica fetcher thread need to back off upon getting errors on partitions
> -
>
> Key: KAFKA-1629
> URL: https://issues.apache.org/jira/browse/KAFKA-1629
> Project: Kafka
>  Issue Type: Bug
>Reporter: Guozhang Wang
>  Labels: newbie++
> Fix For: 0.9.0
>
>
> ReplicaFetcherThread's handlePartitionsWithErrors() function needs to be 
> implemented (currently it is an empty function) such that upon getting errors 
> on these partitions, the fetcher thread will back off the corresponding 
> simple consumer to retry fetching that partition.
> This can happen when there is leader migration, the replica may get a bit 
> delayed receiving the leader ISR update request before keeping retry fetching 
> the old leader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: kafka 0.8.1: Producer.send() can block forever when a broker is down

2014-09-17 Thread Neha Narkhede
Make sense. Please file a JIRA and attach a patch there. It will be great
to add a simple test case as well.

Thanks,
Neha

On Wed, Sep 17, 2014 at 8:25 AM, Jonathan Weeks Gmail <
jonathanbwe...@gmail.com> wrote:

>
> The issue is that even with one down broker, the rest of the cluster is
> up, but unreachable from the producer client in this case, which defeats
> the high availability characteristics of clustering.
>
> For any producer trying to use the service, it is "russian roulette"
> whether you will get meta-data back when asking for topic/partition data.
>
> The ClientUtils code rightly iterates through the broker list looking for
> the metadata in random order, but if the first broker in the list is down,
> the others are never retried in a timely manner.
>
> An example stacktrace shows the problem:
>
> default-dispatcher-3" prio=5 tid=0x7fef131c6000 nid=0x5f03 runnable
> [0x0001146d2000]
>   java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.Net.connect0(Native Method)
> at sun.nio.ch.Net.connect(Net.java:465)
> at sun.nio.ch.Net.connect(Net.java:457)
> at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:670)
> - locked <0x0007ad4c1b50> (a java.lang.Object)
> - locked <0x0007ad4c1b70> (a java.lang.Object)
> - locked <0x0007ad4c1b60> (a java.lang.Object)
> at kafka.network.BlockingChannel.connect(BlockingChannel.scala:57)
> - locked <0x0007ad3f3408> (a java.lang.Object)
> at kafka.producer.SyncProducer.connect(SyncProducer.scala:141)
> at kafka.producer.SyncProducer.getOrMakeConnection(SyncProducer.scala:156)
> at
> kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:68)
> - locked <0x0007ad3de648> (a java.lang.Object)
> at kafka.producer.SyncProducer.send(SyncProducer.scala:112)
> at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:53)
> at
> kafka.producer.BrokerPartitionInfo.updateInfo(BrokerPartitionInfo.scala:82)
> at
> kafka.producer.async.DefaultEventHandler$$anonfun$handle$1.apply$mcV$sp(DefaultEventHandler.scala:67)
> at kafka.utils.Utils$.swallow(Utils.scala:167)
> at kafka.utils.Logging$class.swallowError(Logging.scala:106)
> at kafka.utils.Utils$.swallowError(Utils.scala:46)
> at
> kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:67)
> at kafka.producer.Producer.send(Producer.scala:76)
>
> An eight minute timeout is a non-starter for a clustered (HA) service. One
> would expect the system to respect the request.timeout.ms config setting,
> which it does, unless a broker host is down and happens to be first in the
> shuffled list of brokers to try to get the metadata.
>
> I believe this bug is also exacerbated by the fact that the meta data is
> (rightly) refreshed via the topic.metadata.refresh.interval.ms config
> setting, which defaults to every 10 minutes. AFAIK, this means that if a
> single broker is down, every new producer as well as every existing
> producer has a (1/clusterSize-1) chance of either not starting or hanging
> for a minimum of 8 minutes, (assuming the tcp connection code times out),
> every 10 minutes (or whatever topic.metadata.refresh.interval.ms is set
> to), if I understand correctly.
>
> Initializing the SocketChannel in code that doesn't respect the
> request.timeout.ms setting logically defeats the spirit of the timeout
> setting as well makes as the iteration code in ClientUtils far less useful:
>
> (from fetchTopicMetadata:)
> val shuffledBrokers = Random.shuffle(brokers)
> while(i < shuffledBrokers.size && !fetchMetaDataSucceeded) {
>   val producer: SyncProducer =
> ProducerPool.createSyncProducer(producerConfig, shuffledBrokers(i))
>   info("Fetching metadata from broker %s with correlation id %d for %d
> topic(s) %s".format(shuffledBrokers(i), correlationId, topics.size, topics))
>   try {
> topicMetadataResponse = producer.send(topicMetadataRequest)
>
> Opening the connection with a timeout as Jack suggests seems far
> preferable to the current situation.
>
> Best Regards,
>
> -Jonathan
>
>
> On Sep 16, 2014, at 10:08 PM, Jun Rao  wrote:
> > Jack,
> >
> > If the broker is down, channel.connect() should throw an IOException,
> > instead of blocking forever. In your case, is the broker host down? In
> that
> > case, the connect call will likely wait for the default tcp connection
> > timeout, which is 8+ mins.
> >
> > Thanks,
> >
> > Jun
> >
> > On Tue, Sep 16, 2014 at 5:43 PM, Jack Foy  wrote:
> >
> >> We observe that when a broker is down, Producer.send() can get into a
> >> state where it will block forever, even when using the async producer.
> >>
> >> When a Producer first sends data, it fetches topic metadata from the
> >> broker cluster. To do this, it shuffles the list of hosts in the
> cluster,
> >> then iterates through the list querying each broker.
> >>
> >> For each broker in the shuffled list, the Producer creates a
> SyncProducer
> >> and invokes SyncProducer.send().
> >> SyncProducer.send() creates a BlockingChannel and invokes

[jira] [Commented] (KAFKA-1624) building on JDK 8 fails

2014-09-17 Thread Joe Stein (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137466#comment-14137466
 ] 

Joe Stein commented on KAFKA-1624:
--

Thanks, so nothing to fix/to-do yet but lets leave this open as I expect it 
will come up again so folks know what is going on.  We can eventually turn it 
into managing multiple builds for JDK version and Scala version down the road. 

> building on JDK 8 fails
> ---
>
> Key: KAFKA-1624
> URL: https://issues.apache.org/jira/browse/KAFKA-1624
> Project: Kafka
>  Issue Type: Bug
>Reporter: Joe Stein
>  Labels: newbie
> Fix For: 0.9.0
>
>
> {code}
> Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512m; 
> support was removed in 8.0
> error: error while loading CharSequence, class file 
> '/usr/lib/jvm/java-8-oracle/jre/lib/rt.jar(java/lang/CharSequence.class)' is 
> broken
> (class java.lang.RuntimeException/bad constant pool tag 18 at byte 10)
> error: error while loading Comparator, class file 
> '/usr/lib/jvm/java-8-oracle/jre/lib/rt.jar(java/util/Comparator.class)' is 
> broken
> (class java.lang.RuntimeException/bad constant pool tag 18 at byte 20)
> error: error while loading AnnotatedElement, class file 
> '/usr/lib/jvm/java-8-oracle/jre/lib/rt.jar(java/lang/reflect/AnnotatedElement.class)'
>  is broken
> (class java.lang.RuntimeException/bad constant pool tag 18 at byte 76)
> error: error while loading Arrays, class file 
> '/usr/lib/jvm/java-8-oracle/jre/lib/rt.jar(java/util/Arrays.class)' is broken
> (class java.lang.RuntimeException/bad constant pool tag 18 at byte 765)
> /tmp/sbt_53783b12/xsbt/ExtractAPI.scala:395: error: java.util.Comparator does 
> not take type parameters
>   private[this] val sortClasses = new Comparator[Symbol] {
> ^
> 5 errors found
> :core:compileScala FAILED
> FAILURE: Build failed with an exception.
> * What went wrong:
> Execution failed for task ':core:compileScala'.
> > org.gradle.messaging.remote.internal.PlaceholderException (no error message)
> * Try:
> Run with --stacktrace option to get the stack trace. Run with --info or 
> --debug option to get more log output.
> BUILD FAILED
> Total time: 1 mins 48.298 secs
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1368) Upgrade log4j

2014-09-17 Thread Abhishek Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137492#comment-14137492
 ] 

Abhishek Sharma commented on KAFKA-1368:


Log4j 2 is having many advance features and is very much improved then previous 
version.
Are we thinking over the lines to change the dependency to log4j 2??

> Upgrade log4j
> -
>
> Key: KAFKA-1368
> URL: https://issues.apache.org/jira/browse/KAFKA-1368
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.8.0
>Reporter: Vladislav Pernin
>
> Upgrade log4j to at least 1.2.16 ou 1.2.17.
> Usage of EnhancedPatternLayout will be possible.
> It allows to set delimiters around the full log, stacktrace included, making 
> log messages collection easier with tools like Logstash.
> Example : <[%d{}]...[%t] %m%throwable>%n
> <[2014-04-08 11:07:20,360] ERROR [KafkaApi-1] Error when processing fetch 
> request for partition [X,6] offset 700 from consumer with correlation id 
> 0 (kafka.server.KafkaApis)
> kafka.common.OffsetOutOfRangeException: Request for offset 700 but we only 
> have log segments in the range 16021 to 16021.
> at kafka.log.Log.read(Log.scala:429)
> ...
> at java.lang.Thread.run(Thread.java:744)>



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: MBeans, dashes, underscores, and KAFKA-1481

2014-09-17 Thread Jun Rao
Bhavesh,

Yes, allowing dot in clientId and topic makes it a bit harder to define the
JMX bean names. I see a couple of solutions here.

1. Disable dot in clientId and topic names. The issue is that dot may
already be used in existing deployment.

2. We can represent the JMX bean name differently in the new producer.
Instead of
  kafka.producer.myclientid:type=mytopic
we could change it to
  kafka.producer:clientId=myclientid,topic=mytopic

I felt that option 2 is probably better since it doesn't affect existing
users.

Otis,

We probably can also use option 2 to address KAFKA-1481. For topic/clientid
specific metrics, we could explicitly specify the metric name so that it
contains "topic=mytopic,clientid=myclientid". That seems to be a much
cleaner way than having all parts included in a single string separated by
'|'.

Thanks,

Jun




On Tue, Sep 16, 2014 at 5:15 PM, Bhavesh Mistry 
wrote:

> HI Otis,
>
> What is migration path ?  If topic with special chars exists already(
> ".","-","|" etc)  in previous version of producer/consumer of Kafka, what
> happens after the upgrade new producer or consumer (kafka version) ?  Also,
> in new producer API (Kafka Trunk), does this enforce the rule about client
> id as well ?
>
> Thanks,
>
> Bhavesh
>
> On Tue, Sep 16, 2014 at 2:09 PM, Otis Gospodnetic <
> otis.gospodne...@gmail.com> wrote:
>
> > Hi,
> >
> > So maybe I should I should have asked the Q explicitly:
> > Could we commit the patch from
> > https://issues.apache.org/jira/browse/KAFKA-1481 now that, I hope, it's
> > clear what problems the current MBean names can cause?
> >
> > Thanks,
> > Otis
> > --
> > Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> > Solr & Elasticsearch Support * http://sematext.com/
> >
> >
> >
> > On Mon, Sep 15, 2014 at 10:40 PM, Otis Gospodnetic <
> > otis.gospodne...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > *Problem:*
> > > Some Kafka 0.8.x MBeans have names composed of things like  > > group>--.  Note how dashes are used as delimiters.
> > >  When  and  don't contain the delimiter
> character
> > > all is good if you want to extract parts of this MBean name by simply
> > > splitting on the delimiter character.  The problem is that dashes are
> > > allowed in topic and group names, so this splitting doesn't work.
> > > Moreover, underscores are also used as delimiters, and they can also be
> > > used in things like topic names.
> > >
> > > *Example*:
> > > This MBean's name is composed of --BytesPerSec:
> > >
> > > kafka.consumer:type="ConsumerTopicMetrics",
> name="*myGroup**-myTopic**-*
> > > BytesPerSec"
> > >
> > > Here we can actually split on "-" and extract all 3 parts from the
> MBean
> > > name::
> > > * consumer group ('*myGroup*')
> > > * topic ('*myTopic*')
> > > * metric (‘BytesPerSec’)
> > >
> > > All good!
> > >
> > > But imagine if I named the group: *my-Group*
> > > And if I named the topic: *my-Topic*
> > >
> > > Then we'd have:
> > > kafka.consumer:type="ConsumerTopicMetrics",
> > name="*my-Group**-my-Topic**-*
> > > BytesPerSec"
> > >
> > > Now splitting on "-" would no longer work!  To extract "my-Group" and
> > > "my-Topic" and "BytesPerSec" parts I would have to know the specific
> > group
> > > name and topic name to look for and could not use generic approach of
> > just
> > > splitting the MBean name on the delimiter.
> > >
> > > *Solution*:
> > > The patch in https://issues.apache.org/jira/browse/KAFKA-1481 replaces
> > > all _ and - characters where they are used as delimiters in MBean names
> > > with a "|" character.  Because the "I" character is not allowed in
> topic
> > > names, consumer groups, host names, splitting on this new and unified
> > > delimiter works.
> > >
> > > I hope this explains the problem, the solution, and that this can make
> it
> > > in the next 0.8.x.
> > >
> > > Otis
> > > --
> > > Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> > > Solr & Elasticsearch Support * http://sematext.com/
> > >
> > >
> >
>


[jira] [Updated] (KAFKA-1620) Make kafka api protocol implementation public

2014-09-17 Thread Jun Rao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao updated KAFKA-1620:
---
   Resolution: Fixed
Fix Version/s: 0.8.2
   Status: Resolved  (was: Patch Available)

Thanks for patch v2. +1 and committed to trunk.

> Make kafka api protocol implementation public
> -
>
> Key: KAFKA-1620
> URL: https://issues.apache.org/jira/browse/KAFKA-1620
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Anton Karamanov
>Assignee: Anton Karamanov
> Fix For: 0.8.2
>
> Attachments: 
> 0001-KAFKA-1620-Make-kafka-api-protocol-implementation-pu.patch, 
> 0002-KAFKA-1620-Make-kafka-api-protocol-implementation-pu.patch
>
>
> Some of the classes which implement Kafka api protocol, such as 
> {{RequestOrResponse}} and {{FetchRequest}} are defined as private to 
> {{kafka}} package. Those classes would be extremely usefull for writing 
> custom clients (we're using Scala with Akka and implementing one directly on 
> top of Akka TCP), and don't seem to contain any actuall internal logic of 
> Kafka. Therefore it seems like a nice idea to make them public.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1618) Exception thrown when running console producer with no port number for the broker

2014-09-17 Thread Balaji Seshadri (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137540#comment-14137540
 ] 

Balaji Seshadri commented on KAFKA-1618:


[~nehanarkhede] Please let me know your decision.

> Exception thrown when running console producer with no port number for the 
> broker
> -
>
> Key: KAFKA-1618
> URL: https://issues.apache.org/jira/browse/KAFKA-1618
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.8.1.1
>Reporter: Gwen Shapira
>Assignee: BalajiSeshadri
>  Labels: newbie
> Fix For: 0.8.2
>
> Attachments: KAFKA-1618-ALL.patch, KAFKA-1618.patch
>
>
> When running console producer with just "localhost" as the broker list, I get 
> ArrayIndexOutOfBounds exception.
> I expect either a clearer error about arguments or for the producer to 
> "guess" a default port.
> [root@shapira-1 bin]# ./kafka-console-producer.sh  --topic rufus1 
> --broker-list localhost
> java.lang.ArrayIndexOutOfBoundsException: 1
>   at 
> kafka.client.ClientUtils$$anonfun$parseBrokerList$1.apply(ClientUtils.scala:102)
>   at 
> kafka.client.ClientUtils$$anonfun$parseBrokerList$1.apply(ClientUtils.scala:97)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>   at kafka.client.ClientUtils$.parseBrokerList(ClientUtils.scala:97)
>   at 
> kafka.producer.BrokerPartitionInfo.(BrokerPartitionInfo.scala:32)
>   at 
> kafka.producer.async.DefaultEventHandler.(DefaultEventHandler.scala:41)
>   at kafka.producer.Producer.(Producer.scala:59)
>   at kafka.producer.ConsoleProducer$.main(ConsoleProducer.scala:158)
>   at kafka.producer.ConsoleProducer.main(ConsoleProducer.scala)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1123) Broker IPv6 addresses parsed incorrectly

2014-09-17 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137568#comment-14137568
 ] 

Jun Rao commented on KAFKA-1123:


Thanks for the patch. Looks good overall. Some minor comments below.

1. ClientUitls: There is still one place that references "bootstrap url". We 
can change it to use the BOOTSTRAP constant.

2. Broker: redundant import of kafka.util.Utils.

3. DataGenerator, TopicMetadata: unused import kafka.util.Utils

4. UtilsTest.testParseHostPort seems can be testParsePort?

5. ClientUtilsTest: Do we need to test 
"[2001:db8:85a3:8d3:1319:8a2e:370:7348]:1234" twice?

6. Could you rebase?



> Broker IPv6 addresses parsed incorrectly
> 
>
> Key: KAFKA-1123
> URL: https://issues.apache.org/jira/browse/KAFKA-1123
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.2
>Reporter: Andrew Otto
>  Labels: newbie
> Attachments: KAFKA-1123.patch, KAFKA-1123_v2.patch
>
>
> It seems that broker addresses are parsed incorrectly when IPv6 addresses are 
> supplied.  IPv6 addresses have colons in them, and Kafka seems to be 
> interpreting the first : as the address:port separator.
> I have only tried this with the console-producer --broker-list option, so I 
> don't know if this affects anything deeper than the CLI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] 0.8.2 release branch, "unofficial" release candidates(s), 0.8.1.2 release

2014-09-17 Thread Jun Rao
Harsha,

We are shooting to cut the 0.8.2 branch this month.

Thanks,

Jun

On Tue, Sep 16, 2014 at 11:25 AM, Harsha  wrote:

> Hi All,
>Do we have any ballpark date on the release of 0.8.1.2 or 0.8.2.
> Thanks,
> Harsha
>
>
> On Thu, Sep 11, 2014, at 03:53 PM, Jay Kreps wrote:
> > I agree that a beta for 0.8.2 would be useful. It would also be good
> > to get it in production at LinkedIn before the final version.
> >
> > Sorry about stalling on the security stuff. Things have been a little
> > busy on our side. Let's get 0.8.2 out and then get that broken up into
> > individual JIRAs that people can take on.
> >
> > -Jay
> >
> > On Wed, Sep 10, 2014 at 5:11 PM, Jun Rao  wrote:
> > > Joe,
> > >
> > > Thanks for starting the discussion.
> > >
> > > (1) I made a pass of the open jiras for 0.8.2 and marked a few of them
> as
> > > blockers for now. There are currently 6 blockers. Ideally, we want to
> get
> > > all those fixed before cutting the 0.8.2 branch. The rest of the jiras
> > > don't really have to be fixed in 0.8.2. So, if anyone wants to help on
> > > fixing those blocker jiras, that would be great. Perhaps we can circle
> back
> > > in a couple of weeks and see how much progress we make on those blocker
> > > jiras.
> > >
> > > (2) A beta 0.8.2 may not be a bad idea.
> > >
> > > (3) We can do 0.8.1.2. However, I'd prefer only trivial and critical
> > > patches to back port. The scala 2.11 patch seems ok.
> > >
> > > (4) Yes, we should start updating the wiki once 0.8.2 is cut.
> > >
> > > (5) Yes, we can include kafka-1555 if it can be fixed in time.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > >
> > > On Wed, Sep 3, 2014 at 6:34 PM, Joe Stein 
> wrote:
> > >
> > >> Hey, I wanted to take a quick pulse to see if we are getting closer
> to a
> > >> branch for 0.8.2.
> > >>
> > >> 1) There still seems to be a lot of open issues
> > >>
> > >>
> https://issues.apache.org/jira/browse/KAFKA/fixforversion/12326167/?selectedTab=com.atlassian.jira.jira-projects-plugin:version-issues-panel
> > >> and our 30 day summary is showing issues: 51 created and *34*
> resolved and
> > >> not
> > >> sure how much of that we could really just decide to push off to
> 0.8.3 or
> > >> 0.9.0 vs working on 0.8.2 as stable for release.  There is already so
> much
> > >> goodness on trunk.  I appreciate the double commit pain especially as
> trunk
> > >> and branch drift (ugh).
> > >>
> > >> 2) Also, I wanted to float the idea of after making the 0.8.2 branch
> that I
> > >> would do some unofficial release candidates for folks to test prior to
> > >> release and vote.  What I was thinking was I would build, upload and
> stage
> > >> like I was preparing artifacts for vote but let the community know to
> go in
> > >> and "have at it" well prior to the vote release.  We don't get a lot
> of
> > >> community votes during a release but issues after (which is natural
> because
> > >> of how things are done).  I have seen four Apache projects doing this
> very
> > >> successfully not only have they had less iterations of RC votes
> (sensitive
> > >> to that myself) but the community kicked back issues they saw by
> giving
> > >> them some "pre release" time to go through their own test and staging
> > >> environments as the release are coming about.
> > >>
> > >> 3) Checking again on "should we have a 0.8.1.2" release if folks in
> the
> > >> community find important features (this might be best asked on the
> user
> > >> list maybe not sure) they don't want/can't wait for which wouldn't be
> too
> > >> much pain/dangerous to back port. Two things that spring to the top
> of my
> > >> head are 2.11 Scala support and fixing the source jars.  Both of
> these are
> > >> easy to patch personally I don't mind but want to gauge more from the
> > >> community on this too.  I have heard gripes ad hoc from folks in
> direct
> > >> communication but no complains really in the public forum and wanted
> to
> > >> open the floor if folks had a need.
> > >>
> > >> 4) 0.9 work I feel is being held up some (or at least resourcing it
> from my
> > >> perspective).  We decided to hold up including SSL (even though we
> have a
> > >> path for it). Jay did a nice update recently to the Security wiki
> which I
> > >> think we should move forward with.  I have some more to
> add/change/update
> > >> and want to start getting down to more details and getting specific
> people
> > >> working on specific tasks but without knowing what we are doing when
> it is
> > >> hard to manage.
> > >>
> > >> 5) I just updated https://issues.apache.org/jira/browse/KAFKA-1555 I
> think
> > >> it is a really important feature update doesn't have to be in 0.8.2
> but we
> > >> need consensus (no pun intended). It fundamentally allows for data in
> min
> > >> two rack requirement which A LOT of data requires for successful save
> to
> > >> occur.
> > >>
> > >> /***
> > >>  Joe Stein
> > >>  Founder, Principal Consultant
>

[jira] [Commented] (KAFKA-1368) Upgrade log4j

2014-09-17 Thread Vladislav Pernin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137579#comment-14137579
 ] 

Vladislav Pernin commented on KAFKA-1368:
-

No problem as soon as it is stable (I have no idea if this is the case for the 
current version) and an layout with the same feature than EnhancedPatternLayout 
does exists.

> Upgrade log4j
> -
>
> Key: KAFKA-1368
> URL: https://issues.apache.org/jira/browse/KAFKA-1368
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.8.0
>Reporter: Vladislav Pernin
>
> Upgrade log4j to at least 1.2.16 ou 1.2.17.
> Usage of EnhancedPatternLayout will be possible.
> It allows to set delimiters around the full log, stacktrace included, making 
> log messages collection easier with tools like Logstash.
> Example : <[%d{}]...[%t] %m%throwable>%n
> <[2014-04-08 11:07:20,360] ERROR [KafkaApi-1] Error when processing fetch 
> request for partition [X,6] offset 700 from consumer with correlation id 
> 0 (kafka.server.KafkaApis)
> kafka.common.OffsetOutOfRangeException: Request for offset 700 but we only 
> have log segments in the range 16021 to 16021.
> at kafka.log.Log.read(Log.scala:429)
> ...
> at java.lang.Thread.run(Thread.java:744)>



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: Kafka-trunk #268

2014-09-17 Thread Apache Jenkins Server
See 

Changes:

[junrao] kafka-1620; Make kafka api protocol implementation public; patched by 
Anton Karamanov; reviewed by Jun Rao

--
[...truncated 394 lines...]
kafka.admin.AdminTest > testPartitionReassignmentNonOverlappingReplicas PASSED

kafka.admin.AdminTest > testReassigningNonExistingPartition PASSED

kafka.admin.AdminTest > testResumePartitionReassignmentThatWasCompleted PASSED

kafka.admin.AdminTest > testPreferredReplicaJsonData PASSED

kafka.admin.AdminTest > testBasicPreferredReplicaElection PASSED

kafka.admin.AdminTest > testShutdownBroker PASSED

kafka.admin.AdminTest > testTopicConfigChange PASSED

kafka.admin.AddPartitionsTest > testTopicDoesNotExist PASSED

kafka.admin.AddPartitionsTest > testWrongReplicaCount PASSED

kafka.admin.AddPartitionsTest > testIncrementPartitions PASSED

kafka.admin.AddPartitionsTest > testManualAssignmentOfReplicas PASSED

kafka.admin.AddPartitionsTest > testReplicaPlacement PASSED

kafka.admin.DeleteTopicTest > testDeleteTopicWithAllAliveReplicas PASSED

kafka.admin.DeleteTopicTest > testResumeDeleteTopicWithRecoveredFollower PASSED

kafka.admin.DeleteTopicTest > testResumeDeleteTopicOnControllerFailover PASSED

kafka.admin.DeleteTopicTest > testPartitionReassignmentDuringDeleteTopic PASSED

kafka.admin.DeleteTopicTest > testDeleteTopicDuringAddPartition FAILED
junit.framework.AssertionFailedError: Admin path /admin/delete_topic/test 
path not deleted even after a replica is restarted
at junit.framework.Assert.fail(Assert.java:47)
at kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:590)
at 
kafka.admin.DeleteTopicTest.verifyTopicDeletion(DeleteTopicTest.scala:246)
at 
kafka.admin.DeleteTopicTest.testDeleteTopicDuringAddPartition(DeleteTopicTest.scala:160)

kafka.admin.DeleteTopicTest > testAddPartitionDuringDeleteTopic PASSED

kafka.admin.DeleteTopicTest > testRecreateTopicAfterDeletion PASSED

kafka.admin.DeleteTopicTest > testDeleteNonExistingTopic PASSED

kafka.api.ProducerFailureHandlingTest > testInvalidPartition PASSED

kafka.api.ProducerFailureHandlingTest > testTooLargeRecordWithAckZero PASSED

kafka.api.ProducerFailureHandlingTest > testTooLargeRecordWithAckOne PASSED

kafka.api.ProducerFailureHandlingTest > testNonExistentTopic PASSED

kafka.api.ProducerFailureHandlingTest > testWrongBrokerList PASSED

kafka.api.ProducerFailureHandlingTest > testNoResponse PASSED

kafka.api.ProducerFailureHandlingTest > testSendAfterClosed PASSED

kafka.api.ProducerFailureHandlingTest > testBrokerFailure PASSED

kafka.api.ProducerFailureHandlingTest > testCannotSendToInternalTopic PASSED

kafka.api.RequestResponseSerializationTest > 
testSerializationAndDeserialization PASSED

kafka.api.ApiUtilsTest > testShortStringNonASCII PASSED

kafka.api.ApiUtilsTest > testShortStringASCII PASSED

kafka.api.test.ProducerSendTest > testSendOffset PASSED

kafka.api.test.ProducerSendTest > testClose PASSED

kafka.api.test.ProducerSendTest > testSendToPartition PASSED

kafka.api.test.ProducerSendTest > testAutoCreateTopic PASSED

kafka.api.test.ProducerCompressionTest > testCompression[0] PASSED

kafka.api.test.ProducerCompressionTest > testCompression[1] PASSED

kafka.javaapi.consumer.ZookeeperConsumerConnectorTest > testBasic PASSED

kafka.javaapi.message.ByteBufferMessageSetTest > testWrittenEqualsRead PASSED

kafka.javaapi.message.ByteBufferMessageSetTest > testIteratorIsConsistent PASSED

kafka.javaapi.message.ByteBufferMessageSetTest > testSizeInBytes PASSED

kafka.javaapi.message.ByteBufferMessageSetTest > 
testIteratorIsConsistentWithCompression PASSED

kafka.javaapi.message.ByteBufferMessageSetTest > testSizeInBytesWithCompression 
PASSED

kafka.javaapi.message.ByteBufferMessageSetTest > testEquals PASSED

kafka.javaapi.message.ByteBufferMessageSetTest > testEqualsWithCompression 
PASSED

kafka.integration.UncleanLeaderElectionTest > testUncleanLeaderElectionEnabled 
FAILED
java.lang.RuntimeException: A broker is already registered on the path 
/brokers/ids/1. This probably indicates that you either have configured a 
brokerid that is already in use, or else you have shutdown this broker and 
restarted it faster than the zookeeper timeout so it appears to be 
re-registering.
at kafka.utils.ZkUtils$.registerBrokerInZk(ZkUtils.scala:174)
at kafka.server.KafkaHealthcheck.register(KafkaHealthcheck.scala:63)
at kafka.server.KafkaHealthcheck.startup(KafkaHealthcheck.scala:45)
at kafka.server.KafkaServer.startup(KafkaServer.scala:121)
at 
kafka.integration.UncleanLeaderElectionTest$$anonfun$verifyUncleanLeaderElectionEnabled$8.apply(UncleanLeaderElectionTest.scala:187)
at 
kafka.integration.UncleanLeaderElectionTest$$anonfun$verifyUncleanLeaderElectionEnabled$8.apply(UncleanLeaderElectionTest.scala:187)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)

[jira] [Assigned] (KAFKA-1618) Exception thrown when running console producer with no port number for the broker

2014-09-17 Thread Balaji Seshadri (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Seshadri reassigned KAFKA-1618:
--

Assignee: Balaji Seshadri  (was: BalajiSeshadri)

> Exception thrown when running console producer with no port number for the 
> broker
> -
>
> Key: KAFKA-1618
> URL: https://issues.apache.org/jira/browse/KAFKA-1618
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.8.1.1
>Reporter: Gwen Shapira
>Assignee: Balaji Seshadri
>  Labels: newbie
> Fix For: 0.8.2
>
> Attachments: KAFKA-1618-ALL.patch, KAFKA-1618.patch
>
>
> When running console producer with just "localhost" as the broker list, I get 
> ArrayIndexOutOfBounds exception.
> I expect either a clearer error about arguments or for the producer to 
> "guess" a default port.
> [root@shapira-1 bin]# ./kafka-console-producer.sh  --topic rufus1 
> --broker-list localhost
> java.lang.ArrayIndexOutOfBoundsException: 1
>   at 
> kafka.client.ClientUtils$$anonfun$parseBrokerList$1.apply(ClientUtils.scala:102)
>   at 
> kafka.client.ClientUtils$$anonfun$parseBrokerList$1.apply(ClientUtils.scala:97)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>   at kafka.client.ClientUtils$.parseBrokerList(ClientUtils.scala:97)
>   at 
> kafka.producer.BrokerPartitionInfo.(BrokerPartitionInfo.scala:32)
>   at 
> kafka.producer.async.DefaultEventHandler.(DefaultEventHandler.scala:41)
>   at kafka.producer.Producer.(Producer.scala:59)
>   at kafka.producer.ConsoleProducer$.main(ConsoleProducer.scala:158)
>   at kafka.producer.ConsoleProducer.main(ConsoleProducer.scala)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KAFKA-1476) Get a list of consumer groups

2014-09-17 Thread Balaji Seshadri (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Seshadri reassigned KAFKA-1476:
--

Assignee: Balaji Seshadri  (was: BalajiSeshadri)

> Get a list of consumer groups
> -
>
> Key: KAFKA-1476
> URL: https://issues.apache.org/jira/browse/KAFKA-1476
> Project: Kafka
>  Issue Type: Wish
>  Components: tools
>Affects Versions: 0.8.1.1
>Reporter: Ryan Williams
>Assignee: Balaji Seshadri
>  Labels: newbie
> Fix For: 0.9.0
>
> Attachments: KAFKA-1476-RENAME.patch, KAFKA-1476.patch
>
>
> It would be useful to have a way to get a list of consumer groups currently 
> active via some tool/script that ships with kafka. This would be helpful so 
> that the system tools can be explored more easily.
> For example, when running the ConsumerOffsetChecker, it requires a group 
> option
> bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --topic test --group 
> ?
> But, when just getting started with kafka, using the console producer and 
> consumer, it is not clear what value to use for the group option.  If a list 
> of consumer groups could be listed, then it would be clear what value to use.
> Background:
> http://mail-archives.apache.org/mod_mbox/kafka-users/201405.mbox/%3cCAOq_b1w=slze5jrnakxvak0gu9ctdkpazak1g4dygvqzbsg...@mail.gmail.com%3e



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KAFKA-328) Write unit test for kafka server startup and shutdown API

2014-09-17 Thread Balaji Seshadri (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Seshadri reassigned KAFKA-328:
-

Assignee: Balaji Seshadri  (was: BalajiSeshadri)

> Write unit test for kafka server startup and shutdown API 
> --
>
> Key: KAFKA-328
> URL: https://issues.apache.org/jira/browse/KAFKA-328
> Project: Kafka
>  Issue Type: Bug
>Reporter: Neha Narkhede
>Assignee: Balaji Seshadri
>  Labels: newbie
>
> Background discussion in KAFKA-320
> People often try to embed KafkaServer in an application that ends up calling 
> startup() and shutdown() repeatedly and sometimes in odd ways. To ensure this 
> works correctly we have to be very careful about cleaning up resources. This 
> is a good practice for making unit tests reliable anyway.
> A good first step would be to add some unit tests on startup and shutdown to 
> cover various cases:
> 1. A Kafka server can startup if it is not already starting up, if it is not 
> currently being shutdown, or if it hasn't been already started
> 2. A Kafka server can shutdown if it is not already shutting down, if it is 
> not currently starting up, or if it hasn't been already shutdown. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-1638) transient unit test failure UncleanLeaderElectionTest

2014-09-17 Thread Jun Rao (JIRA)
Jun Rao created KAFKA-1638:
--

 Summary: transient unit test failure UncleanLeaderElectionTest
 Key: KAFKA-1638
 URL: https://issues.apache.org/jira/browse/KAFKA-1638
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.2
Reporter: Jun Rao


Saw the following transient unit test failure.

kafka.integration.UncleanLeaderElectionTest > testUncleanLeaderElectionEnabled 
FAILED
java.lang.RuntimeException: A broker is already registered on the path 
/brokers/ids/1. This probably indicates that you either have configured a 
brokerid that is already in use, or else you have shutdown this broker and 
restarted it faster than the zookeeper timeout so it appears to be 
re-registering.
at kafka.utils.ZkUtils$.registerBrokerInZk(ZkUtils.scala:174)
at kafka.server.KafkaHealthcheck.register(KafkaHealthcheck.scala:63)
at kafka.server.KafkaHealthcheck.startup(KafkaHealthcheck.scala:45)
at kafka.server.KafkaServer.startup(KafkaServer.scala:121)
at 
kafka.integration.UncleanLeaderElectionTest$$anonfun$verifyUncleanLeaderElectionEnabled$8.apply(UncleanLeaderElectionTest.scala:187)
at 
kafka.integration.UncleanLeaderElectionTest$$anonfun$verifyUncleanLeaderElectionEnabled$8.apply(UncleanLeaderElectionTest.scala:187)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.AbstractTraversable.map(Traversable.scala:105)
at 
kafka.integration.UncleanLeaderElectionTest.verifyUncleanLeaderElectionEnabled(UncleanLeaderElectionTest.scala:187)
at 
kafka.integration.UncleanLeaderElectionTest.testUncleanLeaderElectionEnabled(UncleanLeaderElectionTest.scala:106)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1123) Broker IPv6 addresses parsed incorrectly

2014-09-17 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/KAFKA-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krzysztof Szafrański updated KAFKA-1123:

Attachment: KAFKA-1123_v3.patch

I have updated the patch according to your remarks.

> Broker IPv6 addresses parsed incorrectly
> 
>
> Key: KAFKA-1123
> URL: https://issues.apache.org/jira/browse/KAFKA-1123
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.2
>Reporter: Andrew Otto
>  Labels: newbie
> Attachments: KAFKA-1123.patch, KAFKA-1123_v2.patch, 
> KAFKA-1123_v3.patch
>
>
> It seems that broker addresses are parsed incorrectly when IPv6 addresses are 
> supplied.  IPv6 addresses have colons in them, and Kafka seems to be 
> interpreting the first : as the address:port separator.
> I have only tried this with the console-producer --broker-list option, so I 
> don't know if this affects anything deeper than the CLI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-1639) Support control messages in Kafka

2014-09-17 Thread Chris Riccomini (JIRA)
Chris Riccomini created KAFKA-1639:
--

 Summary: Support control messages in Kafka
 Key: KAFKA-1639
 URL: https://issues.apache.org/jira/browse/KAFKA-1639
 Project: Kafka
  Issue Type: Improvement
Reporter: Chris Riccomini


The current transactionality proposal 
(https://cwiki.apache.org/confluence/display/KAFKA/Transactional+Messaging+in+Kafka)
 and implementation use control messages to handle transactions in Kafka. Kafka 
traditionally hasn't had control messages in its topics. Transactionality (as 
it's implemented) introduces this pattern, but appears to do so in a very 
specific fashion (control messages only for transactions).

It seems to me that a good approach to control messages would be to generalize 
the control message model in Kafka to support not just transaction control 
messages, but arbitrary control messages. On the producer side, arbitrary 
control messages should be allowed to be sent, and on the consumer side, these 
control messages should be dropped by default.

Just like transactionality, this would let frameworks (e.g. Samza) and other 
app-specific implementations take advantage of in-topic control messages (as 
opposed to out of band control messages) without any impact on existing 
consumers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1639) Support control messages in Kafka

2014-09-17 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137751#comment-14137751
 ] 

Chris Riccomini commented on KAFKA-1639:


One point of discussion is whether the control message should be a full-blown 
message, or just a field in an existing message's payload. A full message seems 
like a more general solution to me.

> Support control messages in Kafka
> -
>
> Key: KAFKA-1639
> URL: https://issues.apache.org/jira/browse/KAFKA-1639
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Chris Riccomini
>
> The current transactionality proposal 
> (https://cwiki.apache.org/confluence/display/KAFKA/Transactional+Messaging+in+Kafka)
>  and implementation use control messages to handle transactions in Kafka. 
> Kafka traditionally hasn't had control messages in its topics. 
> Transactionality (as it's implemented) introduces this pattern, but appears 
> to do so in a very specific fashion (control messages only for transactions).
> It seems to me that a good approach to control messages would be to 
> generalize the control message model in Kafka to support not just transaction 
> control messages, but arbitrary control messages. On the producer side, 
> arbitrary control messages should be allowed to be sent, and on the consumer 
> side, these control messages should be dropped by default.
> Just like transactionality, this would let frameworks (e.g. Samza) and other 
> app-specific implementations take advantage of in-topic control messages (as 
> opposed to out of band control messages) without any impact on existing 
> consumers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1282) Disconnect idle socket connection in Selector

2014-09-17 Thread nicu marasoiu (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137796#comment-14137796
 ] 

nicu marasoiu commented on KAFKA-1282:
--

Indeed, ack=1 solves it, it gets a reset by peer and a socket timeout on fetch 
meta, than re connects and sends message.

> Disconnect idle socket connection in Selector
> -
>
> Key: KAFKA-1282
> URL: https://issues.apache.org/jira/browse/KAFKA-1282
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.2
>Reporter: Jun Rao
>Assignee: nicu marasoiu
>  Labels: newbie++
> Fix For: 0.9.0
>
> Attachments: 1282_brushed_up.patch, 
> KAFKA-1282_Disconnect_idle_socket_connection_in_Selector.patch
>
>
> To reduce # socket connections, it would be useful for the new producer to 
> close socket connections that are idle. We can introduce a new producer 
> config for the idle time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: MBeans, dashes, underscores, and KAFKA-1481

2014-09-17 Thread Bhavesh Mistry
Sure we can do the option 2 for JMX beans.  But same solution should be
applied to producer.metrics() method for new producer.  Regardless of
metric is access (JMX or via producer), it has to be consistent naming
convention.

For example, I get following metric name when my topic is "topic.dot".  So
can we just use escape char if topic name or client.id contains Kafka
"reserved" chars (.,-,_etc).

topic.*topic.dot*.record-error-rate
topic.*topic.dot*.record-retry-rate
topic.*topic.dot*.byte-rate
topic.*topic.dot*.record-send-rate
topic.*topic.dot*.compression-rate


Thanks,

Bhavesh

On Wed, Sep 17, 2014 at 9:35 AM, Jun Rao  wrote:

> Bhavesh,
>
> Yes, allowing dot in clientId and topic makes it a bit harder to define the
> JMX bean names. I see a couple of solutions here.
>
> 1. Disable dot in clientId and topic names. The issue is that dot may
> already be used in existing deployment.
>
> 2. We can represent the JMX bean name differently in the new producer.
> Instead of
>   kafka.producer.myclientid:type=mytopic
> we could change it to
>   kafka.producer:clientId=myclientid,topic=mytopic
>
> I felt that option 2 is probably better since it doesn't affect existing
> users.
>
> Otis,
>
> We probably can also use option 2 to address KAFKA-1481. For topic/clientid
> specific metrics, we could explicitly specify the metric name so that it
> contains "topic=mytopic,clientid=myclientid". That seems to be a much
> cleaner way than having all parts included in a single string separated by
> '|'.
>
> Thanks,
>
> Jun
>
>
>
>
> On Tue, Sep 16, 2014 at 5:15 PM, Bhavesh Mistry <
> mistry.p.bhav...@gmail.com>
> wrote:
>
> > HI Otis,
> >
> > What is migration path ?  If topic with special chars exists already(
> > ".","-","|" etc)  in previous version of producer/consumer of Kafka, what
> > happens after the upgrade new producer or consumer (kafka version) ?
> Also,
> > in new producer API (Kafka Trunk), does this enforce the rule about
> client
> > id as well ?
> >
> > Thanks,
> >
> > Bhavesh
> >
> > On Tue, Sep 16, 2014 at 2:09 PM, Otis Gospodnetic <
> > otis.gospodne...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > So maybe I should I should have asked the Q explicitly:
> > > Could we commit the patch from
> > > https://issues.apache.org/jira/browse/KAFKA-1481 now that, I hope,
> it's
> > > clear what problems the current MBean names can cause?
> > >
> > > Thanks,
> > > Otis
> > > --
> > > Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> > > Solr & Elasticsearch Support * http://sematext.com/
> > >
> > >
> > >
> > > On Mon, Sep 15, 2014 at 10:40 PM, Otis Gospodnetic <
> > > otis.gospodne...@gmail.com> wrote:
> > >
> > > > Hi,
> > > >
> > > > *Problem:*
> > > > Some Kafka 0.8.x MBeans have names composed of things like  > > > group>--.  Note how dashes are used as
> delimiters.
> > > >  When  and  don't contain the delimiter
> > character
> > > > all is good if you want to extract parts of this MBean name by simply
> > > > splitting on the delimiter character.  The problem is that dashes are
> > > > allowed in topic and group names, so this splitting doesn't work.
> > > > Moreover, underscores are also used as delimiters, and they can also
> be
> > > > used in things like topic names.
> > > >
> > > > *Example*:
> > > > This MBean's name is composed of  group>--BytesPerSec:
> > > >
> > > > kafka.consumer:type="ConsumerTopicMetrics",
> > name="*myGroup**-myTopic**-*
> > > > BytesPerSec"
> > > >
> > > > Here we can actually split on "-" and extract all 3 parts from the
> > MBean
> > > > name::
> > > > * consumer group ('*myGroup*')
> > > > * topic ('*myTopic*')
> > > > * metric (‘BytesPerSec’)
> > > >
> > > > All good!
> > > >
> > > > But imagine if I named the group: *my-Group*
> > > > And if I named the topic: *my-Topic*
> > > >
> > > > Then we'd have:
> > > > kafka.consumer:type="ConsumerTopicMetrics",
> > > name="*my-Group**-my-Topic**-*
> > > > BytesPerSec"
> > > >
> > > > Now splitting on "-" would no longer work!  To extract "my-Group" and
> > > > "my-Topic" and "BytesPerSec" parts I would have to know the specific
> > > group
> > > > name and topic name to look for and could not use generic approach of
> > > just
> > > > splitting the MBean name on the delimiter.
> > > >
> > > > *Solution*:
> > > > The patch in https://issues.apache.org/jira/browse/KAFKA-1481
> replaces
> > > > all _ and - characters where they are used as delimiters in MBean
> names
> > > > with a "|" character.  Because the "I" character is not allowed in
> > topic
> > > > names, consumer groups, host names, splitting on this new and unified
> > > > delimiter works.
> > > >
> > > > I hope this explains the problem, the solution, and that this can
> make
> > it
> > > > in the next 0.8.x.
> > > >
> > > > Otis
> > > > --
> > > > Monitoring * Alerting * Anomaly Detection * Centralized Log
> Management
> > > > Solr & Elasticsearch Support * http://sematext.com/
> > > >
> > > >
> > >
> >
>


[Java New Producer] CPU Usage Spike to 100% when network connection is lost

2014-09-17 Thread Bhavesh Mistry
Hi Kafka Dev team,

I see my CPU spike to 100% when network connection is lost for while.  It
seems network  IO thread are very busy logging following error message.  Is
this expected behavior ?

2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR
org.apache.kafka.clients.producer.internals.Sender - Uncaught error in
kafka producer I/O thread:

java.lang.IllegalStateException: No entry found for node -2

at org.apache.kafka.clients.ClusterConnectionStates.nodeState(
ClusterConnectionStates.java:110)

at org.apache.kafka.clients.ClusterConnectionStates.disconnected(
ClusterConnectionStates.java:99)

at org.apache.kafka.clients.NetworkClient.initiateConnect(
NetworkClient.java:394)

at org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(
NetworkClient.java:380)

at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174)

at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175)

at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)

at java.lang.Thread.run(Thread.java:744)

Thanks,

Bhavesh


Re: MBeans, dashes, underscores, and KAFKA-1481

2014-09-17 Thread Otis Gospodnetic
Hi Jun,

On Wed, Sep 17, 2014 at 12:35 PM, Jun Rao  wrote:

> Bhavesh,
>
> Yes, allowing dot in clientId and topic makes it a bit harder to define the
> JMX bean names. I see a couple of solutions here.
>
> 1. Disable dot in clientId and topic names. The issue is that dot may
> already be used in existing deployment.
>
> 2. We can represent the JMX bean name differently in the new producer.
> Instead of
>   kafka.producer.myclientid:type=mytopic
> we could change it to
>   kafka.producer:clientId=myclientid,topic=mytopic
>
> I felt that option 2 is probably better since it doesn't affect existing
> users.
>

If it doesn't affect existing users, great.

If you are saying that each "piece" of MBean name could be expressed as
name=value pair, with something like "," (forbidden in host names, topic
names, client IDs, etc. I assume?) then yes, I think this would be easier
to parse and it would be easier for people to understand what is what.

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



>
> Otis,
>
> We probably can also use option 2 to address KAFKA-1481. For topic/clientid
> specific metrics, we could explicitly specify the metric name so that it
> contains "topic=mytopic,clientid=myclientid". That seems to be a much
> cleaner way than having all parts included in a single string separated by
> '|'.
>
> Thanks,
>
> Jun
>
>
>
>
> On Tue, Sep 16, 2014 at 5:15 PM, Bhavesh Mistry <
> mistry.p.bhav...@gmail.com>
> wrote:
>
> > HI Otis,
> >
> > What is migration path ?  If topic with special chars exists already(
> > ".","-","|" etc)  in previous version of producer/consumer of Kafka, what
> > happens after the upgrade new producer or consumer (kafka version) ?
> Also,
> > in new producer API (Kafka Trunk), does this enforce the rule about
> client
> > id as well ?
> >
> > Thanks,
> >
> > Bhavesh
> >
> > On Tue, Sep 16, 2014 at 2:09 PM, Otis Gospodnetic <
> > otis.gospodne...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > So maybe I should I should have asked the Q explicitly:
> > > Could we commit the patch from
> > > https://issues.apache.org/jira/browse/KAFKA-1481 now that, I hope,
> it's
> > > clear what problems the current MBean names can cause?
> > >
> > > Thanks,
> > > Otis
> > > --
> > > Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> > > Solr & Elasticsearch Support * http://sematext.com/
> > >
> > >
> > >
> > > On Mon, Sep 15, 2014 at 10:40 PM, Otis Gospodnetic <
> > > otis.gospodne...@gmail.com> wrote:
> > >
> > > > Hi,
> > > >
> > > > *Problem:*
> > > > Some Kafka 0.8.x MBeans have names composed of things like  > > > group>--.  Note how dashes are used as
> delimiters.
> > > >  When  and  don't contain the delimiter
> > character
> > > > all is good if you want to extract parts of this MBean name by simply
> > > > splitting on the delimiter character.  The problem is that dashes are
> > > > allowed in topic and group names, so this splitting doesn't work.
> > > > Moreover, underscores are also used as delimiters, and they can also
> be
> > > > used in things like topic names.
> > > >
> > > > *Example*:
> > > > This MBean's name is composed of  group>--BytesPerSec:
> > > >
> > > > kafka.consumer:type="ConsumerTopicMetrics",
> > name="*myGroup**-myTopic**-*
> > > > BytesPerSec"
> > > >
> > > > Here we can actually split on "-" and extract all 3 parts from the
> > MBean
> > > > name::
> > > > * consumer group ('*myGroup*')
> > > > * topic ('*myTopic*')
> > > > * metric (‘BytesPerSec’)
> > > >
> > > > All good!
> > > >
> > > > But imagine if I named the group: *my-Group*
> > > > And if I named the topic: *my-Topic*
> > > >
> > > > Then we'd have:
> > > > kafka.consumer:type="ConsumerTopicMetrics",
> > > name="*my-Group**-my-Topic**-*
> > > > BytesPerSec"
> > > >
> > > > Now splitting on "-" would no longer work!  To extract "my-Group" and
> > > > "my-Topic" and "BytesPerSec" parts I would have to know the specific
> > > group
> > > > name and topic name to look for and could not use generic approach of
> > > just
> > > > splitting the MBean name on the delimiter.
> > > >
> > > > *Solution*:
> > > > The patch in https://issues.apache.org/jira/browse/KAFKA-1481
> replaces
> > > > all _ and - characters where they are used as delimiters in MBean
> names
> > > > with a "|" character.  Because the "I" character is not allowed in
> > topic
> > > > names, consumer groups, host names, splitting on this new and unified
> > > > delimiter works.
> > > >
> > > > I hope this explains the problem, the solution, and that this can
> make
> > it
> > > > in the next 0.8.x.
> > > >
> > > > Otis
> > > > --
> > > > Monitoring * Alerting * Anomaly Detection * Centralized Log
> Management
> > > > Solr & Elasticsearch Support * http://sematext.com/
> > > >
> > > >
> > >
> >
>


[jira] [Commented] (KAFKA-1382) Update zkVersion on partition state update failures

2014-09-17 Thread Noah Yetter (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14138191#comment-14138191
 ] 

Noah Yetter commented on KAFKA-1382:


We put a decent amount of effort at replicating this scenario and were unable 
to.  It has only happened to us in the wild, and never for any apparent reason.

> Update zkVersion on partition state update failures
> ---
>
> Key: KAFKA-1382
> URL: https://issues.apache.org/jira/browse/KAFKA-1382
> Project: Kafka
>  Issue Type: Bug
>Reporter: Joel Koshy
>Assignee: Sriharsha Chintalapani
> Fix For: 0.8.2, 0.8.1.2
>
> Attachments: KAFKA-1382.patch, KAFKA-1382_2014-05-30_21:19:21.patch, 
> KAFKA-1382_2014-05-31_15:50:25.patch, KAFKA-1382_2014-06-04_12:30:40.patch, 
> KAFKA-1382_2014-06-07_09:00:56.patch, KAFKA-1382_2014-06-09_18:23:42.patch, 
> KAFKA-1382_2014-06-11_09:37:22.patch, KAFKA-1382_2014-06-16_13:50:16.patch, 
> KAFKA-1382_2014-06-16_14:19:27.patch
>
>
> Our updateIsr code is currently:
>   private def updateIsr(newIsr: Set[Replica]) {
> debug("Updated ISR for partition [%s,%d] to %s".format(topic, 
> partitionId, newIsr.mkString(",")))
> val newLeaderAndIsr = new LeaderAndIsr(localBrokerId, leaderEpoch, 
> newIsr.map(r => r.brokerId).toList, zkVersion)
> // use the epoch of the controller that made the leadership decision, 
> instead of the current controller epoch
> val (updateSucceeded, newVersion) = 
> ZkUtils.conditionalUpdatePersistentPath(zkClient,
>   ZkUtils.getTopicPartitionLeaderAndIsrPath(topic, partitionId),
>   ZkUtils.leaderAndIsrZkData(newLeaderAndIsr, controllerEpoch), zkVersion)
> if (updateSucceeded){
>   inSyncReplicas = newIsr
>   zkVersion = newVersion
>   trace("ISR updated to [%s] and zkVersion updated to 
> [%d]".format(newIsr.mkString(","), zkVersion))
> } else {
>   info("Cached zkVersion [%d] not equal to that in zookeeper, skip 
> updating ISR".format(zkVersion))
> }
> We encountered an interesting scenario recently when a large producer fully
> saturated the broker's NIC for over an hour. The large volume of data led to
> a number of ISR shrinks (and subsequent expands). The NIC saturation
> affected the zookeeper client heartbeats and led to a session timeout. The
> timeline was roughly as follows:
> - Attempt to expand ISR
> - Expansion written to zookeeper (confirmed in zookeeper transaction logs)
> - Session timeout after around 13 seconds (the configured timeout is 20
>   seconds) so that lines up.
> - zkclient reconnects to zookeeper (with the same session ID) and retries
>   the write - but uses the old zkVersion. This fails because the zkVersion
>   has already been updated (above).
> - The ISR expand keeps failing after that and the only way to get out of it
>   is to bounce the broker.
> In the above code, if the zkVersion is different we should probably update
> the cached version and even retry the expansion until it succeeds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1618) Exception thrown when running console producer with no port number for the broker

2014-09-17 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14138359#comment-14138359
 ] 

Neha Narkhede commented on KAFKA-1618:
--

Agree with [~gwenshap] that whichever way we go, we should make sure that the 
behavior is consistent across all tools. My intuition is that it will be easier 
if we don't guess the port. We can make sure all tools behave the same way and 
give the same error message if the user does not enter the port. Would you like 
to take a stab at it?

> Exception thrown when running console producer with no port number for the 
> broker
> -
>
> Key: KAFKA-1618
> URL: https://issues.apache.org/jira/browse/KAFKA-1618
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.8.1.1
>Reporter: Gwen Shapira
>Assignee: Balaji Seshadri
>  Labels: newbie
> Fix For: 0.8.2
>
> Attachments: KAFKA-1618-ALL.patch, KAFKA-1618.patch
>
>
> When running console producer with just "localhost" as the broker list, I get 
> ArrayIndexOutOfBounds exception.
> I expect either a clearer error about arguments or for the producer to 
> "guess" a default port.
> [root@shapira-1 bin]# ./kafka-console-producer.sh  --topic rufus1 
> --broker-list localhost
> java.lang.ArrayIndexOutOfBoundsException: 1
>   at 
> kafka.client.ClientUtils$$anonfun$parseBrokerList$1.apply(ClientUtils.scala:102)
>   at 
> kafka.client.ClientUtils$$anonfun$parseBrokerList$1.apply(ClientUtils.scala:97)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>   at kafka.client.ClientUtils$.parseBrokerList(ClientUtils.scala:97)
>   at 
> kafka.producer.BrokerPartitionInfo.(BrokerPartitionInfo.scala:32)
>   at 
> kafka.producer.async.DefaultEventHandler.(DefaultEventHandler.scala:41)
>   at kafka.producer.Producer.(Producer.scala:59)
>   at kafka.producer.ConsoleProducer$.main(ConsoleProducer.scala:158)
>   at kafka.producer.ConsoleProducer.main(ConsoleProducer.scala)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1638) transient unit test failure UncleanLeaderElectionTest

2014-09-17 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-1638:
-
Labels: newbie++  (was: )

> transient unit test failure UncleanLeaderElectionTest
> -
>
> Key: KAFKA-1638
> URL: https://issues.apache.org/jira/browse/KAFKA-1638
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.2
>Reporter: Jun Rao
>  Labels: newbie++
>
> Saw the following transient unit test failure.
> kafka.integration.UncleanLeaderElectionTest > 
> testUncleanLeaderElectionEnabled FAILED
> java.lang.RuntimeException: A broker is already registered on the path 
> /brokers/ids/1. This probably indicates that you either have configured a 
> brokerid that is already in use, or else you have shutdown this broker and 
> restarted it faster than the zookeeper timeout so it appears to be 
> re-registering.
> at kafka.utils.ZkUtils$.registerBrokerInZk(ZkUtils.scala:174)
> at kafka.server.KafkaHealthcheck.register(KafkaHealthcheck.scala:63)
> at kafka.server.KafkaHealthcheck.startup(KafkaHealthcheck.scala:45)
> at kafka.server.KafkaServer.startup(KafkaServer.scala:121)
> at 
> kafka.integration.UncleanLeaderElectionTest$$anonfun$verifyUncleanLeaderElectionEnabled$8.apply(UncleanLeaderElectionTest.scala:187)
> at 
> kafka.integration.UncleanLeaderElectionTest$$anonfun$verifyUncleanLeaderElectionEnabled$8.apply(UncleanLeaderElectionTest.scala:187)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at scala.collection.immutable.List.foreach(List.scala:318)
> at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
> at scala.collection.AbstractTraversable.map(Traversable.scala:105)
> at 
> kafka.integration.UncleanLeaderElectionTest.verifyUncleanLeaderElectionEnabled(UncleanLeaderElectionTest.scala:187)
> at 
> kafka.integration.UncleanLeaderElectionTest.testUncleanLeaderElectionEnabled(UncleanLeaderElectionTest.scala:106)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [Java New Producer] CPU Usage Spike to 100% when network connection is lost

2014-09-17 Thread Neha Narkhede
This seems like a problem. Please can you file a JIRA and attach the log4j
output there?

On Wed, Sep 17, 2014 at 2:14 PM, Bhavesh Mistry 
wrote:

> Hi Kafka Dev team,
>
> I see my CPU spike to 100% when network connection is lost for while.  It
> seems network  IO thread are very busy logging following error message.  Is
> this expected behavior ?
>
> 2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR
> org.apache.kafka.clients.producer.internals.Sender - Uncaught error in
> kafka producer I/O thread:
>
> java.lang.IllegalStateException: No entry found for node -2
>
> at org.apache.kafka.clients.ClusterConnectionStates.nodeState(
> ClusterConnectionStates.java:110)
>
> at org.apache.kafka.clients.ClusterConnectionStates.disconnected(
> ClusterConnectionStates.java:99)
>
> at org.apache.kafka.clients.NetworkClient.initiateConnect(
> NetworkClient.java:394)
>
> at org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(
> NetworkClient.java:380)
>
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174)
>
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175)
>
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
>
> at java.lang.Thread.run(Thread.java:744)
>
> Thanks,
>
> Bhavesh
>


[jira] [Comment Edited] (KAFKA-1282) Disconnect idle socket connection in Selector

2014-09-17 Thread nicu marasoiu (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137796#comment-14137796
 ] 

nicu marasoiu edited comment on KAFKA-1282 at 9/18/14 5:12 AM:
---

Indeed, ack=1 solves it, it gets a reset by peer and a socket timeout on fetch 
meta, than re connects and sends message.

[~jkreps], [~nehanarkhede], [~junrao], please let me know any feedback on my 
last uploaded patch, it should be ok I think


was (Author: nmarasoi):
Indeed, ack=1 solves it, it gets a reset by peer and a socket timeout on fetch 
meta, than re connects and sends message.

> Disconnect idle socket connection in Selector
> -
>
> Key: KAFKA-1282
> URL: https://issues.apache.org/jira/browse/KAFKA-1282
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.2
>Reporter: Jun Rao
>Assignee: nicu marasoiu
>  Labels: newbie++
> Fix For: 0.9.0
>
> Attachments: 1282_brushed_up.patch, 
> KAFKA-1282_Disconnect_idle_socket_connection_in_Selector.patch
>
>
> To reduce # socket connections, it would be useful for the new producer to 
> close socket connections that are idle. We can introduce a new producer 
> config for the idle time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-1282) Disconnect idle socket connection in Selector

2014-09-17 Thread nicu marasoiu (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137796#comment-14137796
 ] 

nicu marasoiu edited comment on KAFKA-1282 at 9/18/14 5:21 AM:
---

Indeed, ack=1 solves it for most times but not for all:
- in 6 of 7 tests it gets a reset by peer and a socket timeout on fetch meta, 
than re connects and sends message.
- in one test, after leaving one night the laptop, I entered:
sdfgsdfgdsfg --> that never returned, no exception, nothing at all reported
aaa
aaa
ff
ff



was (Author: nmarasoi):
Indeed, ack=1 solves it, it gets a reset by peer and a socket timeout on fetch 
meta, than re connects and sends message.

[~jkreps], [~nehanarkhede], [~junrao], please let me know any feedback on my 
last uploaded patch, it should be ok I think

> Disconnect idle socket connection in Selector
> -
>
> Key: KAFKA-1282
> URL: https://issues.apache.org/jira/browse/KAFKA-1282
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.2
>Reporter: Jun Rao
>Assignee: nicu marasoiu
>  Labels: newbie++
> Fix For: 0.9.0
>
> Attachments: 1282_brushed_up.patch, 
> KAFKA-1282_Disconnect_idle_socket_connection_in_Selector.patch
>
>
> To reduce # socket connections, it would be useful for the new producer to 
> close socket connections that are idle. We can introduce a new producer 
> config for the idle time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-1282) Disconnect idle socket connection in Selector

2014-09-17 Thread nicu marasoiu (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137796#comment-14137796
 ] 

nicu marasoiu edited comment on KAFKA-1282 at 9/18/14 5:25 AM:
---

Indeed, ack=1 solves it for most times but not for all:
- in 6 of 7 tests it gets a reset by peer and a socket timeout on fetch meta, 
than re connects and sends message.
- in one test, after leaving one night the laptop, I entered:
sdfgsdfgdsfg --> that never returned, no exception, nothing at all reported
aaa
aaa
ff
ff

The "ok" flow, which reproduces most of the time with ack=1 is (sometimes with 
just one of the 2 expcetions):
gffhgfhgfjfgjhfhjfgjhf
[2014-09-18 08:22:35,057] WARN Failed to send producer request with correlation 
id 43 to broker 0 with data for partitions [topi,0] 
(kafka.producer.async.DefaultEventHandler)
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
..
at 
kafka.producer.async.ProducerSendThread.run(ProducerSendThread.scala:44)
[2014-09-18 08:22:36,663] WARN Fetching topic metadata with correlation id 44 
for topics [Set(topi)] from broker [id:0,host:localhost,port:9092] failed 
(kafka.client.ClientUtils$)
java.net.SocketTimeoutException
at 
sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:226)
..
[2014-09-18 08:22:36,664] ERROR fetching topic metadata for topics [Set(topi)] 
from broker [ArrayBuffer(id:0,host:localhost,port:9092)] failed 
(kafka.utils.Utils$)
kafka.common.KafkaException: fetching topic metadata for topics [Set(topi)] 
from broker [ArrayBuffer(id:0,host:localhost,port:9092)] failed
at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:71)
..
Caused by: java.net.SocketTimeoutException
at 
sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:226)
.. 
kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29)
at kafka.network.BlockingChannel.receive(BlockingChannel.scala:108)
at kafka.producer.SyncProducer.liftedTree1$1(SyncProducer.scala:74)
at 
kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:71)
at kafka.producer.SyncProducer.send(SyncProducer.scala:112)
at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:57)
... 12 more
gffhgfhgfjfgjhfhjfgjhf


was (Author: nmarasoi):
Indeed, ack=1 solves it for most times but not for all:
- in 6 of 7 tests it gets a reset by peer and a socket timeout on fetch meta, 
than re connects and sends message.
- in one test, after leaving one night the laptop, I entered:
sdfgsdfgdsfg --> that never returned, no exception, nothing at all reported
aaa
aaa
ff
ff


> Disconnect idle socket connection in Selector
> -
>
> Key: KAFKA-1282
> URL: https://issues.apache.org/jira/browse/KAFKA-1282
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.2
>Reporter: Jun Rao
>Assignee: nicu marasoiu
>  Labels: newbie++
> Fix For: 0.9.0
>
> Attachments: 1282_brushed_up.patch, 
> KAFKA-1282_Disconnect_idle_socket_connection_in_Selector.patch
>
>
> To reduce # socket connections, it would be useful for the new producer to 
> close socket connections that are idle. We can introduce a new producer 
> config for the idle time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [Java New Producer] CPU Usage Spike to 100% when network connection is lost

2014-09-17 Thread Jay Kreps
Also do you know what version you are running we did fix several bugs
similar to this against trunk.

-Jay

On Wed, Sep 17, 2014 at 2:14 PM, Bhavesh Mistry
 wrote:
> Hi Kafka Dev team,
>
> I see my CPU spike to 100% when network connection is lost for while.  It
> seems network  IO thread are very busy logging following error message.  Is
> this expected behavior ?
>
> 2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR
> org.apache.kafka.clients.producer.internals.Sender - Uncaught error in
> kafka producer I/O thread:
>
> java.lang.IllegalStateException: No entry found for node -2
>
> at org.apache.kafka.clients.ClusterConnectionStates.nodeState(
> ClusterConnectionStates.java:110)
>
> at org.apache.kafka.clients.ClusterConnectionStates.disconnected(
> ClusterConnectionStates.java:99)
>
> at org.apache.kafka.clients.NetworkClient.initiateConnect(
> NetworkClient.java:394)
>
> at org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(
> NetworkClient.java:380)
>
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174)
>
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175)
>
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
>
> at java.lang.Thread.run(Thread.java:744)
>
> Thanks,
>
> Bhavesh


[jira] [Commented] (KAFKA-1633) Data loss if broker is killed

2014-09-17 Thread gautham varada (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14138603#comment-14138603
 ] 

gautham varada commented on KAFKA-1633:
---

let me explain this a bit better.  

Say I have my Jmeter app send 100 events to the kafka broker.   The brokers 
acks 60 events before both are killed.  Should I expect 60 events in the kafka 
logs ?

if yes then I dont see this behaviour , its always less than 60, when the 
producer retry is set to the default value of 3.  

I repeat the same test with the retry count as 1 and I dont loose any messages.

> Data loss if broker is killed
> -
>
> Key: KAFKA-1633
> URL: https://issues.apache.org/jira/browse/KAFKA-1633
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.1.1
> Environment: centos 6.3, open jdk 7
>Reporter: gautham varada
>Assignee: Jun Rao
>
> We have a 2 node kafka cluster, we experienced data loss when we did a kill 
> -9 on the brokers.  We also found a work around to prevent this loss.
> Replication factor :2, 4 partitions
> Steps to reproduce
> 1. Create a 2 node cluster with replication factor 2, num partitions 4
> 2. We used Jmeter to pump events
> 3. We used kafka web console to inspect the log size after the test
> During the test, we simultaneously killed the brokers using kill -9 and we 
> tallied the metrics reported by jmeter and the size we observed in the web 
> console, we lost tons of messages.
> We went back and set the Producer retry to 1 instead of the default 3 and 
> repeated the above tests and we did not loose a single message.
> We repeated the above tests with the Producer retry set to 3 and 1 with a 
> single broker and we observed data loss when the retry was 3 and no loss when 
> the retry was 1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)