Re: Metrics package discussion

2015-03-29 Thread Jun Rao
There is another thing to consider. We plan to reuse the client components
on the server side over time. For example, as part of the security work, we
are looking into replacing the server side network code with the client
network code (KAFKA-1928). However, the client network already has metrics
based on KM.

Thanks,

Jun

On Sat, Mar 28, 2015 at 1:34 PM, Jay Kreps  wrote:

> I think Joel's summary is good.
>
> I'll add a few more points:
>
> As discussed memory matter a lot if we want to be able to give percentiles
> at the client or topic level, in which case we will have thousands of them.
> If we just do histograms at the global level then it is not a concern. The
> argument for doing histograms at the client and topic level is that
> averages are often very misleading, especially for latency information or
> other asymmetric distributions. Most people who care about this kind of
> thing would say the same. If you are a user of a multi-tenant cluster then
> you probably care a lot more about stats for your application or your topic
> rather than the global, so it could be nice to have histograms for these. I
> don't feel super strongly about this.
>
> The ExponentiallyDecayingSample is internally
> a ConcurrentSkipListMap. This seems to have an overhead of
> about 64 bytes per entry. So a 1000 element sample is 64KB. For global
> metrics this is fine, but for granular metrics not workable.
>
> Two other issues I'm not sure about:
>
> 1. Is there a way to get metric descriptions into the coda hale JMX output?
> One of the really nicest practical things about the new client metrics is
> that if you look at them in jconsole each metric has an associated
> description that explains what it means. I think this is a nice usability
> thing--it is really hard to know what to make of the current metrics
> without this kind of documentation and keeping separate docs up-to-date is
> really hard and even if you do it most people won't find it.
>
> 2. I'm not clear if the sample decay in the histogram is actually the same
> as for the other stats. It seems like it isn't but this would make
> interpretation quite difficult. In other words if I have N metrics
> including some Histograms some Meters, etc are all these measurements all
> taken over the same time window? I actually think they are not, it looks
> like there are different sampling methodologies across. So this means if
> you have a dashboard that plots these things side by side the measurement
> at a given point in time is not actually comparable across multiple stats.
> Am I confused about this?
>
> -Jay
>
>
> On Fri, Mar 27, 2015 at 6:27 PM, Joel Koshy  wrote:
>
> > For the samples: it will be at least double that estimate I think
> > since the long array contains (eight byte) references to the actual
> > longs, each of which also have some object overhead.
> >
> > Re: testing: actually, it looks like YM metrics does allow you to
> > drop in your own clock:
> >
> >
> https://github.com/dropwizard/metrics/blob/master/metrics-core/src/main/java/com/codahale/metrics/Clock.java
> >
> >
> https://github.com/dropwizard/metrics/blob/master/metrics-core/src/main/java/com/codahale/metrics/Meter.java#L36
> >
> > Not sure if it was mentioned in this (or some recent) thread but a
> > major motivation in the kafka-common metrics (KM) was absorbing API
> > changes and even mbean naming conventions. For e.g., in the early
> > stages of 0.8 we picked up YM metrics 3.x but collided with client
> > apps at LinkedIn which were still on 2.x. We ended up changing our
> > code to use 2.x in the end. Having our own metrics package makes us
> > less vulnerable to these kinds of changes. The multiple version
> > collision problem is obviously less of an issue with the broker but we
> > are still exposed to possible metric changes in YM metrics.
> >
> > I'm wondering if we need to weigh too much toward the memory overheads
> > of histograms in making a decision here simply because I don't think
> > we have found them to be an extreme necessity for
> > per-clientid/per-partition metrics and they are more critical for
> > aggregate (global) metrics.
> >
> > So it seems the main benefits of switching to KM metrics are:
> > - Less exposure to YM metrics changes
> > - More control over the actual implementation. E.g., there is
> >   considerable research on implementing approximate-but-good-enough
> >   histograms/percentiles that we can try out
> > - Differences (improvements) from YM metrics such as:
> >   - hierarchical sensors
> >   - integrated with quota enforcement
> >   - mbeans can logically group attributes computed from different
> > sensors. So there is logical grouping (as opposed to a separate
> > mbean per sensor as is the case in YM metrics).
> >
> > The main disadvantages:
> > - Everyone's graphs and alerts will break and need to be updated
> > - Histogram support needs to be tested more/improved
> >
> > The first disadvantage is a big one but we aren't exac

[jira] [Commented] (KAFKA-1634) Improve semantics of timestamp in OffsetCommitRequests and update documentation

2015-03-29 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14385848#comment-14385848
 ] 

Jun Rao commented on KAFKA-1634:


Perhaps, we can do KAFKA-1841 together with KAFKA-2068. There will probably be 
less duplicated code that way.

> Improve semantics of timestamp in OffsetCommitRequests and update 
> documentation
> ---
>
> Key: KAFKA-1634
> URL: https://issues.apache.org/jira/browse/KAFKA-1634
> Project: Kafka
>  Issue Type: Bug
>Reporter: Neha Narkhede
>Assignee: Guozhang Wang
>Priority: Blocker
> Fix For: 0.8.3
>
> Attachments: KAFKA-1634.patch, KAFKA-1634_2014-11-06_15:35:46.patch, 
> KAFKA-1634_2014-11-07_16:54:33.patch, KAFKA-1634_2014-11-17_17:42:44.patch, 
> KAFKA-1634_2014-11-21_14:00:34.patch, KAFKA-1634_2014-12-01_11:44:35.patch, 
> KAFKA-1634_2014-12-01_18:03:12.patch, KAFKA-1634_2015-01-14_15:50:15.patch, 
> KAFKA-1634_2015-01-21_16:43:01.patch, KAFKA-1634_2015-01-22_18:47:37.patch, 
> KAFKA-1634_2015-01-23_16:06:07.patch, KAFKA-1634_2015-02-06_11:01:08.patch, 
> KAFKA-1634_2015-03-26_12:16:09.patch, KAFKA-1634_2015-03-26_12:27:18.patch
>
>
> From the mailing list -
> following up on this -- I think the online API docs for OffsetCommitRequest
> still incorrectly refer to client-side timestamps:
> https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-OffsetCommitRequest
> Wasn't that removed and now always handled server-side now?  Would one of
> the devs mind updating the API spec wiki?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2068) Replace OffsetCommit Request/Response with org.apache.kafka.common.requests equivalent

2015-03-29 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14385850#comment-14385850
 ] 

Jun Rao commented on KAFKA-2068:


We may want to consider if it's better to do KAFKA-1841 together with this 
jira. 

> Replace OffsetCommit Request/Response with  org.apache.kafka.common.requests  
> equivalent
> 
>
> Key: KAFKA-2068
> URL: https://issues.apache.org/jira/browse/KAFKA-2068
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Gwen Shapira
> Fix For: 0.8.3
>
>
> Replace OffsetCommit Request/Response with  org.apache.kafka.common.requests  
> equivalent



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 28769: Patch for KAFKA-1809

2015-03-29 Thread Jun Rao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28769/#review78160
---


Thanks for the latest patch. A few more comments below.


clients/src/main/java/org/apache/kafka/common/protocol/ApiVersion.java


Since this is for intra-broker communication, should we move this class to 
core?



core/src/main/scala/kafka/network/SocketServer.scala


Could we only import this in the context where implicit is actually used?



core/src/main/scala/kafka/network/SocketServer.scala


For consistency, let's leave a space after comma. There are probably a few 
other places that need to be changed.



core/src/main/scala/kafka/server/KafkaConfig.scala


Could that just be intra.broker.protocol.version?



core/src/main/scala/kafka/server/KafkaConfig.scala


It seems that a 3-digit value like 0.8.2 is also valid?



core/src/main/scala/kafka/server/KafkaConfig.scala


Could this be private?



core/src/main/scala/kafka/server/KafkaConfig.scala


Do we default to PLAINTEXT://null:6667 or PLAINTEXT://:9092? Do we have to 
explicitly deal with hostName being null?



core/src/main/scala/kafka/server/KafkaConfig.scala


Similar comment as the above, do we need to handle a null adverisedHostName 
explicitly?



core/src/main/scala/kafka/server/MetadataCache.scala


Identation needs to be 2 spaces.



core/src/main/scala/kafka/utils/Utils.scala


Should we just return scala.collection.Map?



core/src/test/scala/unit/kafka/cluster/BrokerTest.scala


Is this necessary given the above tests on equal and hashCode?



core/src/test/scala/unit/kafka/cluster/BrokerTest.scala


Is toString() customized on EndPoint or do you intend to use 
connectionString()? Also, if host is null, should we standardize the connection 
string to be PLAINTEXT://:9092?



core/src/test/scala/unit/kafka/network/SocketServerTest.scala


Could we import the implicit only in the context where it's used?



core/src/test/scala/unit/kafka/producer/SyncProducerTest.scala


This can just be endpoints(SecurityProtocol.PLAINTEXT).port.



core/src/test/scala/unit/kafka/producer/SyncProducerTest.scala


Ditto as the above.



core/src/test/scala/unit/kafka/producer/SyncProducerTest.scala


Ditto as the above.



core/src/test/scala/unit/kafka/producer/SyncProducerTest.scala


Ditto as the above.



system_test/run_all.sh


Would it be better to just have testcase_to_run.json be 
testcase_to_run_all.json? Also, could we change README accordingly?


- Jun Rao


On March 27, 2015, 10:04 p.m., Gwen Shapira wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28769/
> ---
> 
> (Updated March 27, 2015, 10:04 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1809
> https://issues.apache.org/jira/browse/KAFKA-1809
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> forgot rest of patch
> 
> 
> merge with trunk
> 
> 
> Diffs
> -
> 
>   clients/src/main/java/org/apache/kafka/clients/producer/ProducerConfig.java 
> fa9daaef66ff7961e1c46cd0cd8fed18a53bccd8 
>   clients/src/main/java/org/apache/kafka/common/protocol/ApiVersion.java 
> PRE-CREATION 
>   
> clients/src/main/java/org/apache/kafka/common/protocol/SecurityProtocol.java 
> PRE-CREATION 
>   clients/src/main/java/org/apache/kafka/common/utils/Utils.java 
> 920b51a6c3c99639fbc9dc0656373c19fabd 
>   clients/src/test/java/org/apache/kafka/common/utils/UtilsTest.java 
> c899813d55b9c4786adde3d840f040d6645d27c8 
>   config/server.properties 1614260b71a658b405bb24157c8f12b1f1031aa5 
>   core/src/main/scala/kafka/admin/AdminUtils.scala 
> b700110f2d7f1ede235af55d8e37e1b5592c6c7d 
>   core/src/main/scala/kafka/admin/TopicCommand.scala 
> f400b71f8444fffd3fc1d8398a283682390eba4e 
>   core/src/main/scala/kaf

[jira] [Commented] (KAFKA-1809) Refactor brokers to allow listening on multiple ports and IPs

2015-03-29 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14385906#comment-14385906
 ] 

Jun Rao commented on KAFKA-1809:


Thanks for the latest patch. Reviewed. Could you also provide a doc change on 
how to upgrade from 0.8.[0,1,2] to 0.8.3?

> Refactor brokers to allow listening on multiple ports and IPs 
> --
>
> Key: KAFKA-1809
> URL: https://issues.apache.org/jira/browse/KAFKA-1809
> Project: Kafka
>  Issue Type: Sub-task
>  Components: security
>Reporter: Gwen Shapira
>Assignee: Gwen Shapira
> Attachments: KAFKA-1809.patch, KAFKA-1809.v1.patch, 
> KAFKA-1809.v2.patch, KAFKA-1809.v3.patch, 
> KAFKA-1809_2014-12-25_11:04:17.patch, KAFKA-1809_2014-12-27_12:03:35.patch, 
> KAFKA-1809_2014-12-27_12:22:56.patch, KAFKA-1809_2014-12-29_12:11:36.patch, 
> KAFKA-1809_2014-12-30_11:25:29.patch, KAFKA-1809_2014-12-30_18:48:46.patch, 
> KAFKA-1809_2015-01-05_14:23:57.patch, KAFKA-1809_2015-01-05_20:25:15.patch, 
> KAFKA-1809_2015-01-05_21:40:14.patch, KAFKA-1809_2015-01-06_11:46:22.patch, 
> KAFKA-1809_2015-01-13_18:16:21.patch, KAFKA-1809_2015-01-28_10:26:22.patch, 
> KAFKA-1809_2015-02-02_11:55:16.patch, KAFKA-1809_2015-02-03_10:45:31.patch, 
> KAFKA-1809_2015-02-03_10:46:42.patch, KAFKA-1809_2015-02-03_10:52:36.patch, 
> KAFKA-1809_2015-03-16_09:02:18.patch, KAFKA-1809_2015-03-16_09:40:49.patch, 
> KAFKA-1809_2015-03-27_15:04:10.patch
>
>
> The goal is to eventually support different security mechanisms on different 
> ports. 
> Currently brokers are defined as host+port pair, and this definition exists 
> throughout the code-base, therefore some refactoring is needed to support 
> multiple ports for a single broker.
> The detailed design is here: 
> https://cwiki.apache.org/confluence/display/KAFKA/Multiple+Listeners+for+Kafka+Brokers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2046) Delete topic still doesn't work

2015-03-29 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14385996#comment-14385996
 ] 

Neha Narkhede commented on KAFKA-2046:
--

[~onurkaraman] Shouldn't controller.message.queue.size be infinite? It seems 
that the moment there is some backlog of state changes on the controller, it 
will deadlock causing bad things to happen.

> Delete topic still doesn't work
> ---
>
> Key: KAFKA-2046
> URL: https://issues.apache.org/jira/browse/KAFKA-2046
> Project: Kafka
>  Issue Type: Bug
>Reporter: Clark Haskins
>Assignee: Onur Karaman
>
> I just attempted to delete at 128 partition topic with all inbound producers 
> stopped.
> The result was as follows:
> The /admin/delete_topics znode was empty
> the topic under /brokers/topics was removed
> The Kafka topics command showed that the topic was removed
> However, the data on disk on each broker was not deleted and the topic has 
> not yet been re-created by starting up the inbound mirror maker.
> Let me know what additional information is needed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-527) Compression support does numerous byte copies

2015-03-29 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14385999#comment-14385999
 ] 

Neha Narkhede commented on KAFKA-527:
-

[~guozhang] +1. Will be great to re-run the test after your patch.

> Compression support does numerous byte copies
> -
>
> Key: KAFKA-527
> URL: https://issues.apache.org/jira/browse/KAFKA-527
> Project: Kafka
>  Issue Type: Bug
>  Components: compression
>Reporter: Jay Kreps
>Assignee: Yasuhiro Matsuda
>Priority: Critical
> Attachments: KAFKA-527.message-copy.history, KAFKA-527.patch, 
> KAFKA-527_2015-03-16_15:19:29.patch, KAFKA-527_2015-03-19_21:32:24.patch, 
> KAFKA-527_2015-03-25_12:08:00.patch, KAFKA-527_2015-03-25_13:26:36.patch, 
> java.hprof.no-compression.txt, java.hprof.snappy.text
>
>
> The data path for compressing or decompressing messages is extremely 
> inefficient. We do something like 7 (?) complete copies of the data, often 
> for simple things like adding a 4 byte size to the front. I am not sure how 
> this went by unnoticed.
> This is likely the root cause of the performance issues we saw in doing bulk 
> recompression of data in mirror maker.
> The mismatch between the InputStream and OutputStream interfaces and the 
> Message/MessageSet interfaces which are based on byte buffers is the cause of 
> many of these.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-11- Authorization design for kafka security

2015-03-29 Thread Parth Brahmbhatt
Hi Gwen,

Thanks a lot for taking the time to review this. I have tried to address all 
your questions below.

Thanks
Parth
On 3/28/15, 8:08 PM, "Gwen Shapira" 
mailto:gshap...@cloudera.com>> wrote:

Preparing for Tuesday meeting, I went over the KIP :)

First, Parth did an amazing job, the KIP is fantastic - detailed and
readable. Thank you!

Second, I have a lng list of questions :) No objections, just some
things I'm unclear on and random minor comments. In general, I like
the design, I just feel I'm missing parts of the picture.

1. "Yes, Create topic will have an optional acls, the output of
describe will display owner and acls and alter topic will allow to
modify the acls.”  - will be nice to see what the CLI will look like.

  *   I will modify the KIP but I was going to add “—acl ” to 
create-topic and alter-topic.

2. I like the addition of Topic owner. We made the mistake of
forgetting about it when adding authorization to Sqoop2. We probably
want to add “chown” command to the topic commands.

  *   Again we can add “—owner ” to alter topic.

3. "Kafka server will read "authorizer.class” config value at startup
time, create an instance of the specified class and call initialize
method."
We’ll need to validate that users specify only one of those.

  *   The config type will be string so type validation should take care of it.

4. "One added assumption is that on non-secure connections the session
will have principal set to an object whose name() method will return
"Anonymous”."
Can we keep DrWho? :)

  *   Sure, its up to you actually as you are the owner of the jira that 
introduces session concept.

5. "For cluster actions that do not apply to a specific topic like
CREATE we have 2 options. We can either add a broker config called
broker.acls which will point to a json file. This file will be
available on all broker hosts and authorizer will read the acls on
initialization and keep refreshing it every X minutes. Any changes
will require re-distribution of the acl json file. Alternatively we
can add a zookeeper path /brokers/acls and store the acl json as data.
Authorizer can refresh the acl from json every X minutes. In absence
of broker acls the authorizer will fail open, in other words it will
allow all users from all hosts to perform all cluster actions”
I prefer a file to ZK - since thats where we store all use-defined
configurations for now. Everyone knows how to secure a file system :)

  *   I will let everyone vote, file system is fine by me.

6. "When an Acl is missing , this implementation will always fail open
for backward compatibility. “ <- agree, but we need to document that
this makes the default authorizer non-secure

  *   Sure.

7. "If the value of authorizer.class.name is null, in secure mode the
cluster will fail with ConfigException. In non secure mode in absence
of config value forauthorizer.class.name the server will allow all
requests to all topics that , even if the topic has configured acls”
<- I don’t think Kafka has “secure mode” - it can support SSL and
plaintext (un-authenticated) on two different ports simultaneously. I
don’t object to adding such configuration, but we need to decide
exactly what it means.

  *   This is one area of confusion so I will add an open question.

8. "Currently all topic creation/modification/deletion actions are
performed using KafkaAdminUtil which mostly interacts directly with
zookeeper instead of forwarding requests to a broker host. Given all
the code is executed on client side there is no easy way to perform
authorization. “ - since we are adding the admin protocol requests in
KIP-4, perhaps addressing those makes sense.

  *   Yes, We will have to wait for KIP-4 to be delivered.

9. I didn’t see a specification of what is “resource”, I assume its an
enum with things like Topic and… ?

  *   Resource is not an enum, Right now its just name of the topic. Currently 
we will not be able to support cluster actions like CREATE in which case the 
resource can be some constant string like “Kafka-Cluster”.

10. I’m also unclear on where and how “PermissionType” will be used
and what does “DENY takes precedence” mean here.

  *   I originally did not have the “PermissionType” but someone suggested in 
DISCUSS thread that we should add support for ALLOW and DENY acls. The use case 
is to support acls like “Allow user U to Perform READ from all Hosts except 
from Host1 and Host2”. Deny take precedence only means if you have define 2 
ACLs for the same user/resource/operation. DENY acl will take effect I.e. 
“Allow user X to read from *” and “Deny User X to Read from host1 and host2” 
will satisfy the use case described in the previous statement.

11. "What does acls on zookeeper node look like given all our admin
APIs are currently performed directly from client?” <- “secure mode”
Kafka will need to set ACLs on ZK (using ZK’s model of ACLs) and both
Kafka and everyone else will need to use them (this is limited to
Kerberos authentication

[jira] [Commented] (KAFKA-2029) Improving controlled shutdown for rolling updates

2015-03-29 Thread Jiangjie Qin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386002#comment-14386002
 ] 

Jiangjie Qin commented on KAFKA-2029:
-

[~dmitrybugaychenko] I was trying to apply the patch to latest trunk but 
failed. Because there has been some code changes since 0.8.1, it would be 
better to create the patch based on the latest trunk. Could you rebase? Also, I 
was not able find the rb... sorry for not making it clear, you can follow the 
step in the following link to use kafka-patch-review.py to create reviewboard.
https://cwiki.apache.org/confluence/display/KAFKA/Patch+submission+and+review
Looking forward to the patch :)

> Improving controlled shutdown for rolling updates
> -
>
> Key: KAFKA-2029
> URL: https://issues.apache.org/jira/browse/KAFKA-2029
> Project: Kafka
>  Issue Type: Improvement
>  Components: controller
>Affects Versions: 0.8.1.1
>Reporter: Dmitry Bugaychenko
>Assignee: Neha Narkhede
>Priority: Critical
> Attachments: KAFKA-2029.patch
>
>
> Controlled shutdown as implemented currently can cause numerous problems: 
> deadlocks, local and global datalos, partitions without leader and etc. In 
> some cases the only way to restore cluster is to stop it completelly using 
> kill -9 and start again.
> Note 1: I'm aware of KAFKA-1305, but the proposed workaround to increase 
> queue size makes things much worse (see discussion there).
> Note 2: The problems described here can occure in any setup, but they are 
> extremly painful in setup with large brokers (36 disks, 1000+ assigned 
> partitions per broker in our case).
> Note 3: These improvements are actually workarounds and it is worth to 
> consider global refactoring of the controller (make it single thread, or even 
> get rid of it in the favour of ZK leader elections for partitions).
> The problems and improvements are:
> # Controlled shutdown takes a long time (10+ minutes), broker sends multiple 
> shutdown requests and finally considers it as failed and procedes to unclean 
> shutdow, controller got stuck for a while (holding a lock waiting for free 
> space in controller-to-broker queue). After broker starts back it receives 
> followers request and erases highwatermarks (with a message that "replica 
> does not exists" - controller hadn't yet sent a request with replica 
> assignment), then controller starts replicas on the broker it deletes all 
> local data (due to missing highwatermarks). Furthermore, controller starts 
> processing pending shutdown request and stops replicas on the broker letting 
> it in a non-functional state. Solution to the problem might be to increase 
> time broker waits for controller reponse to shutdown request, but this 
> timeout is taken from controller.socket.timeout.ms which is global for all 
> broker-controller communication and setting it to 30 minutes is dangerous. 
> *Proposed solution: introduce dedicated config parameter for this timeout 
> with a high default*.
> # If a broker gets down during controlled shutdown and did not come back 
> controller got stuck in a deadlock (one thread owns the lock and tries to add 
> message to the dead broker's queue, send thread is a infinite loop trying to 
> retry message to the dead broker, and the broker failure handler is waiting 
> for a lock). There are numerous partitions without a leader and the only way 
> out is to kill -9 the controller. *Proposed solution: add timeout for adding 
> message to broker's queue*. ControllerChannelManager.sendRequest:
> {code}
>   def sendRequest(brokerId : Int, request : RequestOrResponse, callback: 
> (RequestOrResponse) => Unit = null) {
> brokerLock synchronized {
>   val stateInfoOpt = brokerStateInfo.get(brokerId)
>   stateInfoOpt match {
> case Some(stateInfo) =>
>   // ODKL Patch: prevent infinite hang on trying to send message to a 
> dead broker.
>   // TODO: Move timeout to config
>   if (!stateInfo.messageQueue.offer((request, callback), 10, 
> TimeUnit.SECONDS)) {
> error("Timed out trying to send message to broker " + 
> brokerId.toString)
> // Do not throw, as it brings controller into completely 
> non-functional state
> // "Controller to broker state change requests batch is not empty 
> while creating a new one"
> //throw new IllegalStateException("Timed out trying to send 
> message to broker " + brokerId.toString)
>   }
> case None =>
>   warn("Not sending request %s to broker %d, since it is 
> offline.".format(request, brokerId))
>   }
> }
>   }
> {code}
> # When broker which is a controler starts shut down if auto leader rebalance 
> is running it deadlocks in the end (shutdown thread owns the lock an

Re: Review Request 31967: Patch for KAFKA-1546

2015-03-29 Thread Neha Narkhede

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31967/#review78166
---

Ship it!


Ship It!

- Neha Narkhede


On March 27, 2015, 6:58 p.m., Aditya Auradkar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31967/
> ---
> 
> (Updated March 27, 2015, 6:58 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1546
> https://issues.apache.org/jira/browse/KAFKA-1546
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> PATCH for KAFKA-1546
> 
> 
> PATCH for KAFKA-1546
> 
> Brief summary of changes:
> - Added a lagBegin metric inside Replica to track the lag in terms of time 
> since the replica did not read from the LEO
> - Using lag begin value in the check for ISR expand and shrink
> - Removed the max lag messages config since it is no longer necessary
> - Returning the initialLogEndOffset in LogReadResult corresponding to the the 
> LEO before actually reading from the log.
> - Unit test cases to test ISR shrinkage and expansion
> 
> Updated KAFKA-1546 patch to reflect Neha and Jun's comments
> 
> 
> Addressing Joel's comments
> 
> 
> Addressing Jun and Guozhang's comments
> 
> 
> Addressing Jun's comments
> 
> 
> Diffs
> -
> 
>   core/src/main/scala/kafka/cluster/Partition.scala 
> c4bf48a801007ebe7497077d2018d6dffe1677d4 
>   core/src/main/scala/kafka/cluster/Replica.scala 
> bd13c20338ce3d73113224440e858a12814e5adb 
>   core/src/main/scala/kafka/log/Log.scala 
> 06b8ecc5d11a1acfbaf3c693c42bf3ce5b2cd86d 
>   core/src/main/scala/kafka/server/FetchDataInfo.scala 
> 26f278f9b75b1c9c83a720ca9ebd8ab812d19d39 
>   core/src/main/scala/kafka/server/KafkaConfig.scala 
> 422451aec5ea0442eb2e4c1ae772885b813904a9 
>   core/src/main/scala/kafka/server/ReplicaManager.scala 
> 44f0026e6676d8d764dd59dbcc6bb7bb727a3ba6 
>   core/src/test/scala/unit/kafka/server/ISRExpirationTest.scala 
> 92152358c95fa9178d71bd1c079af0a0bd8f1da8 
>   core/src/test/scala/unit/kafka/server/KafkaConfigConfigDefTest.scala 
> 191251d1340b5e5b2d649b37af3c6c1896d07e6e 
>   core/src/test/scala/unit/kafka/server/LogRecoveryTest.scala 
> 92d6b2c672f74cdd526f2e98da8f7fb3696a88e3 
>   core/src/test/scala/unit/kafka/server/SimpleFetchTest.scala 
> efb457334bd87e4c5b0dd2c66ae1995993cd0bc1 
> 
> Diff: https://reviews.apache.org/r/31967/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Aditya Auradkar
> 
>



[jira] [Commented] (KAFKA-2046) Delete topic still doesn't work

2015-03-29 Thread Onur Karaman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386014#comment-14386014
 ] 

Onur Karaman commented on KAFKA-2046:
-

Correct. Basically Clark and I had two separate occurrences of delete topic not 
working:
1. We're still trying to reproduce Clark's occurrence.
2. I ran into a deadlock that came from controller.message.queue.size being 
overridden to a small value. It was addressed on 3/26 by just using the default 
value of Integer.MAX_VALUE.

> Delete topic still doesn't work
> ---
>
> Key: KAFKA-2046
> URL: https://issues.apache.org/jira/browse/KAFKA-2046
> Project: Kafka
>  Issue Type: Bug
>Reporter: Clark Haskins
>Assignee: Onur Karaman
>
> I just attempted to delete at 128 partition topic with all inbound producers 
> stopped.
> The result was as follows:
> The /admin/delete_topics znode was empty
> the topic under /brokers/topics was removed
> The Kafka topics command showed that the topic was removed
> However, the data on disk on each broker was not deleted and the topic has 
> not yet been re-created by starting up the inbound mirror maker.
> Let me know what additional information is needed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2046) Delete topic still doesn't work

2015-03-29 Thread Sriharsha Chintalapani (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386015#comment-14386015
 ] 

Sriharsha Chintalapani commented on KAFKA-2046:
---

[~nehanarkhede] [~onurkaraman] In 0.8.2 we did make 
controller.message.queue.size to be infinite (Int.maxValue) by default because 
of these issues.

> Delete topic still doesn't work
> ---
>
> Key: KAFKA-2046
> URL: https://issues.apache.org/jira/browse/KAFKA-2046
> Project: Kafka
>  Issue Type: Bug
>Reporter: Clark Haskins
>Assignee: Onur Karaman
>
> I just attempted to delete at 128 partition topic with all inbound producers 
> stopped.
> The result was as follows:
> The /admin/delete_topics znode was empty
> the topic under /brokers/topics was removed
> The Kafka topics command showed that the topic was removed
> However, the data on disk on each broker was not deleted and the topic has 
> not yet been re-created by starting up the inbound mirror maker.
> Let me know what additional information is needed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1848) Checking shutdown during each iteration of ZookeeperConsumerConnector

2015-03-29 Thread Jiangjie Qin (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiangjie Qin updated KAFKA-1848:

Fix Version/s: (was: 0.9.0)
   0.8.3

> Checking shutdown during each iteration of ZookeeperConsumerConnector
> -
>
> Key: KAFKA-1848
> URL: https://issues.apache.org/jira/browse/KAFKA-1848
> Project: Kafka
>  Issue Type: Bug
>Reporter: Aditya Auradkar
>Assignee: Aditya Auradkar
> Fix For: 0.8.3
>
>
> In ZookeeperConsumerConnector the syncedRebalance() method checks the 
> isShuttingDown flag before it triggers a rebalance. However, it does not 
> recheck the same value between successive retries which is possible if the 
> consumer is shutting down.
> This acquires the rebalanceLock and blocks shutdown from completing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2046) Delete topic still doesn't work

2015-03-29 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386021#comment-14386021
 ] 

Neha Narkhede commented on KAFKA-2046:
--

[~onurkaraman] Could [~clarkhaskins] reproduce the delete topic issue once you 
changed the controller.message.queue.size to its default size (infinite)?

> Delete topic still doesn't work
> ---
>
> Key: KAFKA-2046
> URL: https://issues.apache.org/jira/browse/KAFKA-2046
> Project: Kafka
>  Issue Type: Bug
>Reporter: Clark Haskins
>Assignee: Onur Karaman
>
> I just attempted to delete at 128 partition topic with all inbound producers 
> stopped.
> The result was as follows:
> The /admin/delete_topics znode was empty
> the topic under /brokers/topics was removed
> The Kafka topics command showed that the topic was removed
> However, the data on disk on each broker was not deleted and the topic has 
> not yet been re-created by starting up the inbound mirror maker.
> Let me know what additional information is needed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1716) hang during shutdown of ZookeeperConsumerConnector

2015-03-29 Thread Jiangjie Qin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386019#comment-14386019
 ] 

Jiangjie Qin commented on KAFKA-1716:
-

[~dchu] Do you mean that the fetchers have never been created? That's a good 
point, but I still do not totally understand the cause.
The first rebalance of ZookeeperConsumeConnector occurs when KafkaStreams are 
created. That means you need to specify a topic count map and create streams. 
So leader finder thread will send TopicMetadataRequest to brokers to get back 
the topic metadata for the topic. By default auto topic creation is enabled on 
Kafka brokers. That means when broker saw a TopicMetadataRequest asking for a 
topic that does not exist yet, it will created it and return the topic 
metadata. So the consumer fetcher thread will be created for the topic on 
ZookeeperConsumerConnector. However, if auto topic creation is turned off, your 
description looks possible.
About the shutdown issue. You are right, that is an issue that has been fixed 
in KAFKA-1848, but seems not included in 0.8.2. I just changed the fix version 
from 0.9.0 to 0.8.3 instead.

> hang during shutdown of ZookeeperConsumerConnector
> --
>
> Key: KAFKA-1716
> URL: https://issues.apache.org/jira/browse/KAFKA-1716
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.8.1.1
>Reporter: Sean Fay
>Assignee: Neha Narkhede
> Attachments: after-shutdown.log, before-shutdown.log, 
> kafka-shutdown-stuck.log
>
>
> It appears to be possible for {{ZookeeperConsumerConnector.shutdown()}} to 
> wedge in the case that some consumer fetcher threads receive messages during 
> the shutdown process.
> Shutdown thread:
> {code}-- Parking to wait for: 
> java/util/concurrent/CountDownLatch$Sync@0x2aaaf3ef06d0
> at jrockit/vm/Locks.park0(J)V(Native Method)
> at jrockit/vm/Locks.park(Locks.java:2230)
> at sun/misc/Unsafe.park(ZJ)V(Native Method)
> at java/util/concurrent/locks/LockSupport.park(LockSupport.java:156)
> at 
> java/util/concurrent/locks/AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
> at 
> java/util/concurrent/locks/AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
> at 
> java/util/concurrent/locks/AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
> at java/util/concurrent/CountDownLatch.await(CountDownLatch.java:207)
> at kafka/utils/ShutdownableThread.shutdown(ShutdownableThread.scala:36)
> at 
> kafka/server/AbstractFetcherThread.shutdown(AbstractFetcherThread.scala:71)
> at 
> kafka/server/AbstractFetcherManager$$anonfun$closeAllFetchers$2.apply(AbstractFetcherManager.scala:121)
> at 
> kafka/server/AbstractFetcherManager$$anonfun$closeAllFetchers$2.apply(AbstractFetcherManager.scala:120)
> at 
> scala/collection/TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
> at 
> scala/collection/mutable/HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
> at 
> scala/collection/mutable/HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
> at 
> scala/collection/mutable/HashTable$class.foreachEntry(HashTable.scala:226)
> at scala/collection/mutable/HashMap.foreachEntry(HashMap.scala:39)
> at scala/collection/mutable/HashMap.foreach(HashMap.scala:98)
> at 
> scala/collection/TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
> at 
> kafka/server/AbstractFetcherManager.closeAllFetchers(AbstractFetcherManager.scala:120)
> ^-- Holding lock: java/lang/Object@0x2aaaebcc7318[thin lock]
> at 
> kafka/consumer/ConsumerFetcherManager.stopConnections(ConsumerFetcherManager.scala:148)
> at 
> kafka/consumer/ZookeeperConsumerConnector.liftedTree1$1(ZookeeperConsumerConnector.scala:171)
> at 
> kafka/consumer/ZookeeperConsumerConnector.shutdown(ZookeeperConsumerConnector.scala:167){code}
> ConsumerFetcherThread:
> {code}-- Parking to wait for: 
> java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject@0x2aaaebcc7568
> at jrockit/vm/Locks.park0(J)V(Native Method)
> at jrockit/vm/Locks.park(Locks.java:2230)
> at sun/misc/Unsafe.park(ZJ)V(Native Method)
> at java/util/concurrent/locks/LockSupport.park(LockSupport.java:156)
> at 
> java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
> at 
> java/util/concurrent/LinkedBlockingQueue.put(LinkedBlockingQueue.java:306)
> at kafka/consumer/PartitionTopicInfo.enqueue(PartitionTopicInfo.scala:60)
> at 
> kafka/consumer/ConsumerFetcherThread.processPartitionData(ConsumerFetcherThread.scala:49)
> at 
> kafka/server/AbstractFetcherTh

[jira] [Issue Comment Deleted] (KAFKA-1716) hang during shutdown of ZookeeperConsumerConnector

2015-03-29 Thread Jiangjie Qin (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiangjie Qin updated KAFKA-1716:

Comment: was deleted

(was: [~dchu] Do you mean that the fetchers have never been created? That's a 
good point, but I still do not totally understand the cause.
The first rebalance of ZookeeperConsumeConnector occurs when KafkaStreams are 
created. That means you need to specify a topic count map and create streams. 
So leader finder thread will send TopicMetadataRequest to brokers to get back 
the topic metadata for the topic. By default auto topic creation is enabled on 
Kafka brokers. That means when broker saw a TopicMetadataRequest asking for a 
topic that does not exist yet, it will created it and return the topic 
metadata. So the consumer fetcher thread will be created for the topic on 
ZookeeperConsumerConnector. However, if auto topic creation is turned off, your 
description looks possible.
About the shutdown issue. You are right, that is an issue that has been fixed 
in KAFKA-1848, but seems not included in 0.8.2. I just changed the fix version 
from 0.9.0 to 0.8.3 instead.)

> hang during shutdown of ZookeeperConsumerConnector
> --
>
> Key: KAFKA-1716
> URL: https://issues.apache.org/jira/browse/KAFKA-1716
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.8.1.1
>Reporter: Sean Fay
>Assignee: Neha Narkhede
> Attachments: after-shutdown.log, before-shutdown.log, 
> kafka-shutdown-stuck.log
>
>
> It appears to be possible for {{ZookeeperConsumerConnector.shutdown()}} to 
> wedge in the case that some consumer fetcher threads receive messages during 
> the shutdown process.
> Shutdown thread:
> {code}-- Parking to wait for: 
> java/util/concurrent/CountDownLatch$Sync@0x2aaaf3ef06d0
> at jrockit/vm/Locks.park0(J)V(Native Method)
> at jrockit/vm/Locks.park(Locks.java:2230)
> at sun/misc/Unsafe.park(ZJ)V(Native Method)
> at java/util/concurrent/locks/LockSupport.park(LockSupport.java:156)
> at 
> java/util/concurrent/locks/AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
> at 
> java/util/concurrent/locks/AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
> at 
> java/util/concurrent/locks/AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
> at java/util/concurrent/CountDownLatch.await(CountDownLatch.java:207)
> at kafka/utils/ShutdownableThread.shutdown(ShutdownableThread.scala:36)
> at 
> kafka/server/AbstractFetcherThread.shutdown(AbstractFetcherThread.scala:71)
> at 
> kafka/server/AbstractFetcherManager$$anonfun$closeAllFetchers$2.apply(AbstractFetcherManager.scala:121)
> at 
> kafka/server/AbstractFetcherManager$$anonfun$closeAllFetchers$2.apply(AbstractFetcherManager.scala:120)
> at 
> scala/collection/TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
> at 
> scala/collection/mutable/HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
> at 
> scala/collection/mutable/HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
> at 
> scala/collection/mutable/HashTable$class.foreachEntry(HashTable.scala:226)
> at scala/collection/mutable/HashMap.foreachEntry(HashMap.scala:39)
> at scala/collection/mutable/HashMap.foreach(HashMap.scala:98)
> at 
> scala/collection/TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
> at 
> kafka/server/AbstractFetcherManager.closeAllFetchers(AbstractFetcherManager.scala:120)
> ^-- Holding lock: java/lang/Object@0x2aaaebcc7318[thin lock]
> at 
> kafka/consumer/ConsumerFetcherManager.stopConnections(ConsumerFetcherManager.scala:148)
> at 
> kafka/consumer/ZookeeperConsumerConnector.liftedTree1$1(ZookeeperConsumerConnector.scala:171)
> at 
> kafka/consumer/ZookeeperConsumerConnector.shutdown(ZookeeperConsumerConnector.scala:167){code}
> ConsumerFetcherThread:
> {code}-- Parking to wait for: 
> java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject@0x2aaaebcc7568
> at jrockit/vm/Locks.park0(J)V(Native Method)
> at jrockit/vm/Locks.park(Locks.java:2230)
> at sun/misc/Unsafe.park(ZJ)V(Native Method)
> at java/util/concurrent/locks/LockSupport.park(LockSupport.java:156)
> at 
> java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
> at 
> java/util/concurrent/LinkedBlockingQueue.put(LinkedBlockingQueue.java:306)
> at kafka/consumer/PartitionTopicInfo.enqueue(PartitionTopicInfo.scala:60)
> at 
> kafka/consumer/ConsumerFetcherThread.processPartitionData(ConsumerFetcherThread.scala:49)
> at 
> kafka/server/AbstractFetcherThread$$anonfun$processFetc

[jira] [Commented] (KAFKA-1716) hang during shutdown of ZookeeperConsumerConnector

2015-03-29 Thread Jiangjie Qin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386022#comment-14386022
 ] 

Jiangjie Qin commented on KAFKA-1716:
-

[~dchu] Do you mean that the fetchers have never been created? That's a good 
point, but I still do not totally understand the cause.
The first rebalance of ZookeeperConsumeConnector occurs when KafkaStreams are 
created. That means you need to specify a topic count map and create streams. 
So leader finder thread will send TopicMetadataRequest to brokers to get back 
the topic metadata for the topic. By default auto topic creation is enabled on 
Kafka brokers. That means when broker saw a TopicMetadataRequest asking for a 
topic that does not exist yet, it will created it and return the topic 
metadata. So the consumer fetcher thread will be created for the topic on 
ZookeeperConsumerConnector. However, if auto topic creation is turned off, your 
description looks possible.
About the shutdown issue. You are right, that is an issue that has been fixed 
in KAFKA-1848, but seems not included in 0.8.2. I just changed the fix version 
from 0.9.0 to 0.8.3 instead.

> hang during shutdown of ZookeeperConsumerConnector
> --
>
> Key: KAFKA-1716
> URL: https://issues.apache.org/jira/browse/KAFKA-1716
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.8.1.1
>Reporter: Sean Fay
>Assignee: Neha Narkhede
> Attachments: after-shutdown.log, before-shutdown.log, 
> kafka-shutdown-stuck.log
>
>
> It appears to be possible for {{ZookeeperConsumerConnector.shutdown()}} to 
> wedge in the case that some consumer fetcher threads receive messages during 
> the shutdown process.
> Shutdown thread:
> {code}-- Parking to wait for: 
> java/util/concurrent/CountDownLatch$Sync@0x2aaaf3ef06d0
> at jrockit/vm/Locks.park0(J)V(Native Method)
> at jrockit/vm/Locks.park(Locks.java:2230)
> at sun/misc/Unsafe.park(ZJ)V(Native Method)
> at java/util/concurrent/locks/LockSupport.park(LockSupport.java:156)
> at 
> java/util/concurrent/locks/AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
> at 
> java/util/concurrent/locks/AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
> at 
> java/util/concurrent/locks/AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
> at java/util/concurrent/CountDownLatch.await(CountDownLatch.java:207)
> at kafka/utils/ShutdownableThread.shutdown(ShutdownableThread.scala:36)
> at 
> kafka/server/AbstractFetcherThread.shutdown(AbstractFetcherThread.scala:71)
> at 
> kafka/server/AbstractFetcherManager$$anonfun$closeAllFetchers$2.apply(AbstractFetcherManager.scala:121)
> at 
> kafka/server/AbstractFetcherManager$$anonfun$closeAllFetchers$2.apply(AbstractFetcherManager.scala:120)
> at 
> scala/collection/TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
> at 
> scala/collection/mutable/HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
> at 
> scala/collection/mutable/HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
> at 
> scala/collection/mutable/HashTable$class.foreachEntry(HashTable.scala:226)
> at scala/collection/mutable/HashMap.foreachEntry(HashMap.scala:39)
> at scala/collection/mutable/HashMap.foreach(HashMap.scala:98)
> at 
> scala/collection/TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
> at 
> kafka/server/AbstractFetcherManager.closeAllFetchers(AbstractFetcherManager.scala:120)
> ^-- Holding lock: java/lang/Object@0x2aaaebcc7318[thin lock]
> at 
> kafka/consumer/ConsumerFetcherManager.stopConnections(ConsumerFetcherManager.scala:148)
> at 
> kafka/consumer/ZookeeperConsumerConnector.liftedTree1$1(ZookeeperConsumerConnector.scala:171)
> at 
> kafka/consumer/ZookeeperConsumerConnector.shutdown(ZookeeperConsumerConnector.scala:167){code}
> ConsumerFetcherThread:
> {code}-- Parking to wait for: 
> java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject@0x2aaaebcc7568
> at jrockit/vm/Locks.park0(J)V(Native Method)
> at jrockit/vm/Locks.park(Locks.java:2230)
> at sun/misc/Unsafe.park(ZJ)V(Native Method)
> at java/util/concurrent/locks/LockSupport.park(LockSupport.java:156)
> at 
> java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
> at 
> java/util/concurrent/LinkedBlockingQueue.put(LinkedBlockingQueue.java:306)
> at kafka/consumer/PartitionTopicInfo.enqueue(PartitionTopicInfo.scala:60)
> at 
> kafka/consumer/ConsumerFetcherThread.processPartitionData(ConsumerFetcherThread.scala:49)
> at 
> kafka/server/AbstractFetcherTh

[jira] [Commented] (KAFKA-1983) TestEndToEndLatency can be unreliable after hard kill

2015-03-29 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386028#comment-14386028
 ] 

Neha Narkhede commented on KAFKA-1983:
--

cc [~junrao]

> TestEndToEndLatency can be unreliable after hard kill
> -
>
> Key: KAFKA-1983
> URL: https://issues.apache.org/jira/browse/KAFKA-1983
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jun Rao
>Assignee: Grayson Chao
>  Labels: newbie
>
> If you hard kill TestEndToEndLatency, the committed offset remains the last 
> checkpointed one. However, more messages are now appended after the last 
> checkpointed offset. When restarting TestEndToEndLatency, the consumer 
> resumes from the last checkpointed offset and will report really low latency 
> since it doesn't need to wait for a new message to be produced to read the 
> next message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2034) sourceCompatibility not set in Kafka build.gradle

2015-03-29 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-2034:
-
Labels: newbie  (was: )

> sourceCompatibility not set in Kafka build.gradle
> -
>
> Key: KAFKA-2034
> URL: https://issues.apache.org/jira/browse/KAFKA-2034
> Project: Kafka
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.8.2.1
>Reporter: Derek Bassett
>Priority: Minor
>  Labels: newbie
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> The build.gradle does not explicitly set the sourceCompatibility version in 
> build.gradle.  This allows kafka when built by Java 1.8 to incorrectly set 
> the wrong version of the class files.  This also would allow Java 1.8 
> features to be merged into Kafka.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2034) sourceCompatibility not set in Kafka build.gradle

2015-03-29 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-2034:
-
Component/s: build

> sourceCompatibility not set in Kafka build.gradle
> -
>
> Key: KAFKA-2034
> URL: https://issues.apache.org/jira/browse/KAFKA-2034
> Project: Kafka
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.8.2.1
>Reporter: Derek Bassett
>Priority: Minor
>  Labels: newbie
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> The build.gradle does not explicitly set the sourceCompatibility version in 
> build.gradle.  This allows kafka when built by Java 1.8 to incorrectly set 
> the wrong version of the class files.  This also would allow Java 1.8 
> features to be merged into Kafka.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1215) Rack-Aware replica assignment option

2015-03-29 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386037#comment-14386037
 ] 

Neha Narkhede commented on KAFKA-1215:
--

[~allenxwang] This was inactive for a while, but I think it will be good to 
wait until KAFKA-1792 is done to propose a solution for rack-awareness. 

> Rack-Aware replica assignment option
> 
>
> Key: KAFKA-1215
> URL: https://issues.apache.org/jira/browse/KAFKA-1215
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.0
>Reporter: Joris Van Remoortere
>Assignee: Jun Rao
> Fix For: 0.9.0
>
> Attachments: rack_aware_replica_assignment_v1.patch, 
> rack_aware_replica_assignment_v2.patch
>
>
> Adding a rack-id to kafka config. This rack-id can be used during replica 
> assignment by using the max-rack-replication argument in the admin scripts 
> (create topic, etc.). By default the original replication assignment 
> algorithm is used because max-rack-replication defaults to -1. 
> max-rack-replication > -1 is not honored if you are doing manual replica 
> assignment (preffered).
> If this looks good I can add some test cases specific to the rack-aware 
> assignment.
> I can also port this to trunk. We are currently running 0.8.0 in production 
> and need this, so i wrote the patch against that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1293) Mirror maker housecleaning

2015-03-29 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386039#comment-14386039
 ] 

Neha Narkhede commented on KAFKA-1293:
--

cc [~junrao]

> Mirror maker housecleaning
> --
>
> Key: KAFKA-1293
> URL: https://issues.apache.org/jira/browse/KAFKA-1293
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 0.8.1
>Reporter: Jay Kreps
>Priority: Minor
>  Labels: usability
> Attachments: KAFKA-1293.patch
>
>
> Mirror maker uses it's own convention for command-line arguments, e.g. 
> --num.producers, where everywhere else follows the unix convention like 
> --num-producers. This is annoying because when running different tools you 
> have to constantly remember whatever quirks of the person who wrote that tool.
> Mirror maker should also have a top-level wrapper script in bin/ to make tab 
> completion work and so you don't have to remember the fully qualified class 
> name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-1293) Mirror maker housecleaning

2015-03-29 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386039#comment-14386039
 ] 

Neha Narkhede edited comment on KAFKA-1293 at 3/30/15 12:15 AM:


cc [~junrao] who can help with the access to the wiki.


was (Author: nehanarkhede):
cc [~junrao]

> Mirror maker housecleaning
> --
>
> Key: KAFKA-1293
> URL: https://issues.apache.org/jira/browse/KAFKA-1293
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 0.8.1
>Reporter: Jay Kreps
>Priority: Minor
>  Labels: usability
> Attachments: KAFKA-1293.patch
>
>
> Mirror maker uses it's own convention for command-line arguments, e.g. 
> --num.producers, where everywhere else follows the unix convention like 
> --num-producers. This is annoying because when running different tools you 
> have to constantly remember whatever quirks of the person who wrote that tool.
> Mirror maker should also have a top-level wrapper script in bin/ to make tab 
> completion work and so you don't have to remember the fully qualified class 
> name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1983) TestEndToEndLatency can be unreliable after hard kill

2015-03-29 Thread Sriharsha Chintalapani (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386040#comment-14386040
 ] 

Sriharsha Chintalapani commented on KAFKA-1983:
---

[~gchao] Here are the steps
1) start a single node kafka server 
2)  create a topic "test"
3) ./bin/kafka-run-class.sh kafka.tools.TestEndToEndLatency localhost:9092 
localhost:2181 test 10 1000 1
(USAGE: java kafka.tools.TestEndToEndLatency$ broker_list zookeeper_connect 
topic num_messages consumer_fetch_max_wait producer_acks)
4) hard kill the TestEndToEndLatency
5) restarting the number 3 step causes it report low latency.

> TestEndToEndLatency can be unreliable after hard kill
> -
>
> Key: KAFKA-1983
> URL: https://issues.apache.org/jira/browse/KAFKA-1983
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jun Rao
>Assignee: Grayson Chao
>  Labels: newbie
>
> If you hard kill TestEndToEndLatency, the committed offset remains the last 
> checkpointed one. However, more messages are now appended after the last 
> checkpointed offset. When restarting TestEndToEndLatency, the consumer 
> resumes from the last checkpointed offset and will report really low latency 
> since it doesn't need to wait for a new message to be produced to read the 
> next message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1548) Refactor the "replica_id" in requests

2015-03-29 Thread Santosh Pingale (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386050#comment-14386050
 ] 

Santosh Pingale commented on KAFKA-1548:


[~gwenshap]/[~guozhang]

Can I pick this up?

> Refactor the "replica_id" in requests
> -
>
> Key: KAFKA-1548
> URL: https://issues.apache.org/jira/browse/KAFKA-1548
> Project: Kafka
>  Issue Type: Bug
>Reporter: Guozhang Wang
>Assignee: Gwen Shapira
>  Labels: newbie
> Fix For: 0.9.0
>
>
> Today in many requests like fetch and offset we have a integer replica_id 
> field, if the request is from a follower consumer it is the broker id from 
> that follower replica, if it is from a regular consumer it could be one of 
> the two values: "-1" for ordinary consumer, or "-2" for debugging consumer. 
> Hence this replica_id field is used in two folds:
> 1) Logging for trouble shooting in request logs, which can be helpful only 
> when this is from a follower replica, 
> 2) Deciding if it is from the consumer or a replica to logically handle the 
> request in different ways. For this purpose we do not really care about the 
> actually id value.
> We probably would like to do the following improvements:
> 1) Rename "replica_id" to sth. less confusing?
> 2) Change the request.toString() function based on the replica_id, whether it 
> is a positive integer (meaning from a broker replica fetcher) or -1/-2 
> (meaning from a regular consumer).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 32519: Patch for KAFKA-2050

2015-03-29 Thread Neha Narkhede

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32519/#review78167
---

Ship it!


Ship It!

- Neha Narkhede


On March 26, 2015, 2:36 a.m., Tim Brooks wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/32519/
> ---
> 
> (Updated March 26, 2015, 2:36 a.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-2050
> https://issues.apache.org/jira/browse/KAFKA-2050
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> Avoid calling size() on concurrent queue.
> 
> 
> Diffs
> -
> 
>   core/src/main/scala/kafka/network/SocketServer.scala 
> 76ce41aed6e04ac5ba88395c4d5008aca17f9a73 
> 
> Diff: https://reviews.apache.org/r/32519/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Tim Brooks
> 
>



[jira] [Updated] (KAFKA-1994) Evaluate performance effect of chroot check on Topic creation

2015-03-29 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-1994:
-
Reviewer: Jun Rao

> Evaluate performance effect of chroot check on Topic creation
> -
>
> Key: KAFKA-1994
> URL: https://issues.apache.org/jira/browse/KAFKA-1994
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Ashish K Singh
>Assignee: Ashish K Singh
> Attachments: KAFKA-1994.patch, KAFKA-1994_2015-03-03_18:19:45.patch
>
>
> KAFKA-1664 adds check for chroot while creating a node in ZK. ZkPath checks 
> if namespace exists before trying to create a path in ZK. This raises a 
> concern that checking namespace for each path creation might be unnecessary 
> and can potentially make creations expensive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2046) Delete topic still doesn't work

2015-03-29 Thread Onur Karaman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386228#comment-14386228
 ] 

Onur Karaman commented on KAFKA-2046:
-

Yeah the issue still happens on [~clarkhaskins]' cluster with the default 
controller.message.queue.size but I haven't been able to reproduce it on my 
test cluster. We'll try again on his and get the logs and heapdumps to get a 
better idea of what happened.

> Delete topic still doesn't work
> ---
>
> Key: KAFKA-2046
> URL: https://issues.apache.org/jira/browse/KAFKA-2046
> Project: Kafka
>  Issue Type: Bug
>Reporter: Clark Haskins
>Assignee: Onur Karaman
>
> I just attempted to delete at 128 partition topic with all inbound producers 
> stopped.
> The result was as follows:
> The /admin/delete_topics znode was empty
> the topic under /brokers/topics was removed
> The Kafka topics command showed that the topic was removed
> However, the data on disk on each broker was not deleted and the topic has 
> not yet been re-created by starting up the inbound mirror maker.
> Let me know what additional information is needed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)