[jira] [Updated] (KAFKA-2044) Support requests and responses from o.a.k.common in KafkaApis

2015-03-28 Thread Jun Rao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao updated KAFKA-2044:
---
   Resolution: Fixed
Fix Version/s: 0.8.3
   Status: Resolved  (was: Patch Available)

+1 for the latest patch. Committed to trunk after removing unused imports in 
KafkaApis and unused changes in PartitionInfo.

Thanks a lot in getting this piece of groundwork done!

Do you want to file subjiras in KAFKA-1927 to get the remaining requests 
converted individually?

> Support requests and responses from o.a.k.common in KafkaApis
> -
>
> Key: KAFKA-2044
> URL: https://issues.apache.org/jira/browse/KAFKA-2044
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Gwen Shapira
>Assignee: Gwen Shapira
> Fix For: 0.8.3
>
> Attachments: KAFKA-2044.patch, KAFKA-2044_2015-03-25_16:48:01.patch, 
> KAFKA-2044_2015-03-25_16:48:49.patch, KAFKA-2044_2015-03-25_16:53:05.patch, 
> KAFKA-2044_2015-03-25_18:49:24.patch, KAFKA-2044_2015-03-25_19:20:16.patch, 
> KAFKA-2044_2015-03-27_09:22:28.patch
>
>
> As groundwork for KIP-4 and for KAFKA-1927, we'll add some plumbing to 
> support handling of requests and responses from o.a.k.common in KafkaApis.
> This will allow us to add new Api calls just in o.a.k.conmon and to gradually 
> migrate existing requests and responses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2045) Memory Management on the consumer

2015-03-28 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14385375#comment-14385375
 ] 

Jay Kreps commented on KAFKA-2045:
--

The challenge with a rewrite is that it will be hard to know which 
optimizations actually mattered. It will likely also be just a partial 
implementation (i.e. just the fetch request to a single broker) so it will be 
hard to judge what that would look like or what the performance would be if we 
integrated it in the main client.

My recommendation would be to approach this in a data driven way instead. We 
have a working client, let's look at where it actually spends time and then 
improve things that use that time.

Example measurement:
1. % of time spent in decompression
2. % of time spent on CRC check
3. % of time spent on GC
4. Etc.

It would be easy to implement a slightly broken buffer reuse approach that just 
preallocated a largish number of buffers bigger than the size needed for 
experimental purposes and reused those. This would only work for the purposes 
of the perf test but would let us determine the impact of a more complete 
pooling implementation quickly.

> Memory Management on the consumer
> -
>
> Key: KAFKA-2045
> URL: https://issues.apache.org/jira/browse/KAFKA-2045
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Guozhang Wang
>
> We need to add the memory management on the new consumer like we did in the 
> new producer. This would probably include:
> 1. byte buffer re-usage for fetch response partition data.
> 2. byte buffer re-usage for on-the-fly de-compression.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-936) Kafka Metrics Memory Leak

2015-03-28 Thread Jay Kreps (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Kreps resolved KAFKA-936.
-
Resolution: Fixed

> Kafka Metrics Memory Leak 
> --
>
> Key: KAFKA-936
> URL: https://issues.apache.org/jira/browse/KAFKA-936
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.8.0
> Environment: centos linux, jdk 1.6, jboss
>Reporter: Senthil Chittibabu
>Assignee: Neha Narkhede
>Priority: Critical
>
> I am using kafka_2.8.0-0.8.0-SNAPSHOT version. I am running into 
> OutOfMemoryError in PermGen Space. I have set the -XX:MaxPermSize=512m, but I 
> still get the same error. I used profiler to trace the memory leak, and found 
> the following kafka classes to be the cause for the memory leak. Please let 
> me know if you need any additional information to debug this issue. 
> kafka.server.FetcherLagMetrics
> kafka.consumer.FetchRequestAndResponseMetrics
> kafka.consumer.FetchRequestAndResponseStats
> kafka.metrics.KafkaTimer
> kafka.utils.Pool



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-2063) Bound fetch response size

2015-03-28 Thread Jay Kreps (JIRA)
Jay Kreps created KAFKA-2063:


 Summary: Bound fetch response size
 Key: KAFKA-2063
 URL: https://issues.apache.org/jira/browse/KAFKA-2063
 Project: Kafka
  Issue Type: Improvement
Reporter: Jay Kreps


Currently the only bound on the fetch response size is 
max.partition.fetch.bytes * num_partitions. There are two problems:
1. First this bound is often large. You may chose max.partition.fetch.bytes=1MB 
to enable messages of up to 1MB. However if you also need to consume 1k 
partitions this means you may receive a 1GB response in the worst case!
2. The actual memory usage is unpredictable. Partition assignment changes, and 
you only actually get the full fetch amount when you are behind and there is a 
full chunk of data ready. This means an application that seems to work fine 
will suddenly OOM when partitions shift or when the application falls behind.

We need to decouple the fetch response size from the number of partitions.

The proposal for doing this would be to add a new field to the fetch request, 
max_bytes which would control the maximum data bytes we would include in the 
response.

The implementation on the server side would grab data from each partition in 
the fetch request until it hit this limit, then send back just the data for the 
partitions that fit in the response. The implementation would need to start 
from a random position in the list of topics included in the fetch request to 
ensure that in a case of backlog we fairly balance between partitions (to avoid 
first giving just the first partition until that is exhausted, then the next 
partition, etc).

This setting will make the max.partition.fetch.bytes field in the fetch request 
much less useful and we  should discuss just getting rid of it.

I believe this also solves the same thing we were trying to address in 
KAFKA-598. The max_bytes setting now becomes the new limit that would need to 
be compared to max_message size. This can be much larger--e.g. setting a 50MB 
max_bytes setting would be okay, whereas now if you set 50MB you may need to 
allocate 50MB*num_partitions.

This will require evolving the fetch request protocol version to add the new 
field and we should do a KIP for it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2045) Memory Management on the consumer

2015-03-28 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14385387#comment-14385387
 ] 

Jay Kreps commented on KAFKA-2045:
--

Okay so since most the discussion here is on optimizing memory allocation let's 
use this ticket for that.

I filed KAFKA-2063 to cover bounding the fetch response size. I think that is a 
prerequisite for this ticket and also a bigger problem (those big allocations 
actually often cause OOM).

> Memory Management on the consumer
> -
>
> Key: KAFKA-2045
> URL: https://issues.apache.org/jira/browse/KAFKA-2045
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Guozhang Wang
>
> We need to add the memory management on the new consumer like we did in the 
> new producer. This would probably include:
> 1. byte buffer re-usage for fetch response partition data.
> 2. byte buffer re-usage for on-the-fly de-compression.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Jenkins build is back to normal : Kafka-trunk #436

2015-03-28 Thread Apache Jenkins Server
See 



Jenkins build is back to normal : KafkaPreCommit #48

2015-03-28 Thread Apache Jenkins Server
See 



[jira] [Commented] (KAFKA-2045) Memory Management on the consumer

2015-03-28 Thread Rajiv Kurian (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14385500#comment-14385500
 ] 

Rajiv Kurian commented on KAFKA-2045:
-

[~jkreps] Totally agree with you on the concerns with a re-write. I am sure 
I'll end up re-using most of the code, otherwise it will take too long in any 
case. But given this is just a prototype, I want the freedom to be able to make 
changes without being bound by the existing architecture  and class hierarchy 
of the client. Even if I do re-implement some of the parts I'll make sure that 
the client can (a) Do metadata requests so it can react to leaders moving etc. 
(b) Actually read from multiple topic/partitions spread across multiple brokers 
and not just a single broker. Again since this is just a rewrite with the sole 
purpose of exploring possible performance improvements there can be mainly two 
consequences:
i) It shows no improvements: In that case we end up not spending too much time 
changing the current code, and the hacky code just gets us to this conclusion 
faster.
ii) It shows interesting improvements: If this were true, we can afford to 
spend some time seeing which things actually improved performance and make a 
call on how to integrate best.

It might be counterproductive to look at the current client implementation and 
look at the % of time spent in each of the bottlenecks because those numbers 
are a consequence of the current memory layout. For example if we do an on the 
fly CRC check and decompression - CRC check time might go up a bit because now 
we are not striding over a contiguous ByteBuffer in one sweep. Right now the 
current client has this pattern ---  CRC check on Message1--> CRC check on 
Message2 --> CRC check on MessageN --> Hand message 1 to consumer --> 
Hand message N to consumer. Instead with the current proposal we will have a 
pattern of  -  Do CRC on a Message1 --> Hand Message1 to consumer --> Do 
CRC on a Message2 --> Hand Message2 to the consumer . So the CRC checks are 
separated by potential (certain?) cache floundering during the handling of the 
message by the client. On the other hand from the perspective of the consumer, 
the pattern looks like this -- Do CRC and validation on all messages starting 
with 1 to  N --> Hand messages 1 to N to client. Now by the time the Kafka 
consumer is done with validating and deserializing message N, message 1 is 
possibly already out of the cache. With the new approach since we hand over a 
message right after validating it, we give the consumer a hot in cache message, 
which might improve the consumer processing enough to offset for the loss in 
CRC striding efficiency. Or it may not. It might just turn out that doing the 
CRC validation upfront is just a pure win since all the CRC tables will be in 
cache etc and striding access for the CRC math is worth an extra iteration of 
the ByteBuffer contents. But it is might still be more profitable to elide 
copies and prevent creation of objects by doing on the fly decoding and handing 
out indexes into the actual response ByteBuffer. This result might further be 
affected by how expensive the deserialization and processing of the message is. 
If the message is a bloated JSON encoded object that is de-serialized into a 
POJO and then processed really slowly then none of this will probably matter. 
On the other hand if the message is a compact and binary encoded and can be 
processed with minimal cache misses, this stuff might add up. My point is that 
basing the TODOs on the current profile may not be optimal because the profile 
is a massive consequence of the current layout and allocation patterns. Also 
the profile will give %s and we might be able to keep the same %s but just 
still reduce the overall time taken for the entire consumer processing cycle. 
Just to belabor the point even further, the current hash map implementations 
might suffer so many cache misses that they mask an underlying improvement 
opportunity for the data in the maps. Switching to compact primitive arrays 
based open hash maps might surface that opportunity again.

Is there a performance test that is used to keep track of the new Consumer's 
performance? If so maybe I can wrap that in a JMH suite and re-use that to test 
improvements?

> Memory Management on the consumer
> -
>
> Key: KAFKA-2045
> URL: https://issues.apache.org/jira/browse/KAFKA-2045
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Guozhang Wang
>
> We need to add the memory management on the new consumer like we did in the 
> new producer. This would probably include:
> 1. byte buffer re-usage for fetch response partition data.
> 2. byte buffer re-usage for on-the-fly de-compression.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-2045) Memory Management on the consumer

2015-03-28 Thread Rajiv Kurian (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14385500#comment-14385500
 ] 

Rajiv Kurian edited comment on KAFKA-2045 at 3/28/15 7:40 PM:
--

[~jkreps] Totally agree with you on the concerns with a re-write. I am sure 
I'll end up re-using most of the code, otherwise it will take too long in any 
case. But given this is just a prototype, I want the freedom to be able to make 
changes without being bound by the existing architecture  and class hierarchy 
of the client. Even if I do re-implement some of the parts I'll make sure that 
the client can (a) Do metadata requests so it can react to leaders moving etc. 
(b) Actually read from multiple topic/partitions spread across multiple brokers 
and not just a single broker. Again since this is just a rewrite with the sole 
purpose of exploring possible performance improvements there can be mainly two 
consequences:
i) It shows no improvements: In that case we end up not spending too much time 
changing the current code, and the hacky code just gets us to this conclusion 
faster.
ii) It shows interesting improvements: If this were true, we can afford to 
spend some time seeing which things actually improved performance and make a 
call on how to integrate best.

It might be counterproductive to look at the current client implementation and 
look at the % of time spent in each of the bottlenecks because those numbers 
are a consequence of the current memory layout. For example if we do an on the 
fly CRC check and decompression - CRC check time might go up a bit because now 
we are not striding over a contiguous ByteBuffer in one sweep. Right now the 
current client has this pattern ---  CRC check on Message1 --> CRC check on 
Message2  --> CRC check on MessageN --> Hand message 1 to consumer  --> 
Hand message N to consumer. Instead with the current proposal we will have a 
pattern of  -  Do CRC on a Message1 --> Hand Message1 to consumer --> Do 
CRC on a Message2 --> Hand Message2 to the consumer . So the CRC checks are 
separated by potential (certain?) cache floundering during the handling of the 
message by the consumer. On the other hand from the perspective of the 
consumer, the pattern looks like this -- Do CRC and validation on all messages 
starting with 1 to  N --> Hand messages 1 to N to client. Now by the time the 
Kafka consumer is done with validating and deserializing message N, message 1 
is possibly already out of the cache. With the new approach since we hand over 
a message right after validating it, we give the consumer a hot in cache 
message, which might improve the consumer processing enough to offset for the 
loss in CRC striding efficiency. Or it may not. It might just turn out that 
doing the CRC validation upfront is just a pure win since all the CRC tables 
will be in cache etc and striding access for the CRC math is worth an extra 
iteration of the ByteBuffer contents. But it is might still be more profitable 
to elide copies and prevent creation of objects by doing on the fly decoding 
and handing out indexes into the actual response ByteBuffer. This result might 
further be affected by how expensive the deserialization and processing of the 
message is. If the message is a bloated JSON encoded object that is 
de-serialized into a POJO and then processed really slowly then none of this 
will probably matter. On the other hand if the message is a compact and binary 
encoded and can be processed with minimal cache misses, this stuff might add 
up. My point is that basing the TODOs on the current profile may not be optimal 
because the profile is a massive consequence of the current layout and 
allocation patterns. Also the profile will give %s and we might be able to keep 
the same %s but just still reduce the overall time taken for the entire 
consumer processing cycle. Just to belabor the point even further, the current 
hash map implementations might suffer so many cache misses that they mask an 
underlying improvement opportunity for the data in the maps. Switching to 
compact primitive arrays based open hash maps might surface that opportunity 
again.

Is there a performance test that is used to keep track of the new Consumer's 
performance? If so maybe I can wrap that in a JMH suite and re-use that to test 
improvements?


was (Author: rzidane):
[~jkreps] Totally agree with you on the concerns with a re-write. I am sure 
I'll end up re-using most of the code, otherwise it will take too long in any 
case. But given this is just a prototype, I want the freedom to be able to make 
changes without being bound by the existing architecture  and class hierarchy 
of the client. Even if I do re-implement some of the parts I'll make sure that 
the client can (a) Do metadata requests so it can react to leaders moving etc. 
(b) Actually read from multiple topic/partitions 

[jira] [Comment Edited] (KAFKA-2045) Memory Management on the consumer

2015-03-28 Thread Rajiv Kurian (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14385500#comment-14385500
 ] 

Rajiv Kurian edited comment on KAFKA-2045 at 3/28/15 7:40 PM:
--

[~jkreps] Totally agree with you on the concerns with a re-write. I am sure 
I'll end up re-using most of the code, otherwise it will take too long in any 
case. But given this is just a prototype, I want the freedom to be able to make 
changes without being bound by the existing architecture  and class hierarchy 
of the client. Even if I do re-implement some of the parts I'll make sure that 
the client can (a) Do metadata requests so it can react to leaders moving etc. 
(b) Actually read from multiple topic/partitions spread across multiple brokers 
and not just a single broker. Again since this is just a rewrite with the sole 
purpose of exploring possible performance improvements there can be mainly two 
consequences:
i) It shows no improvements: In that case we end up not spending too much time 
changing the current code, and the hacky code just gets us to this conclusion 
faster.
ii) It shows interesting improvements: If this were true, we can afford to 
spend some time seeing which things actually improved performance and make a 
call on how to integrate best.

It might be counterproductive to look at the current client implementation and 
look at the % of time spent in each of the bottlenecks because those numbers 
are a consequence of the current memory layout. For example if we do an on the 
fly CRC check and decompression - CRC check time might go up a bit because now 
we are not striding over a contiguous ByteBuffer in one sweep. Right now the 
current client has this pattern ---  CRC check on Message1 --> CRC check on 
Message2  --> CRC check on MessageN --> Hand message 1 to consumer  --> 
Hand message N to consumer. Instead with the current proposal we will have a 
pattern of  -  Do CRC on a Message1 --> Hand Message1 to consumer --> Do 
CRC on a Message2 --> Hand Message2 to the consumer . So the CRC checks are 
separated by potential (certain?) cache floundering during the handling of the 
message by the client. On the other hand from the perspective of the consumer, 
the pattern looks like this -- Do CRC and validation on all messages starting 
with 1 to  N --> Hand messages 1 to N to client. Now by the time the Kafka 
consumer is done with validating and deserializing message N, message 1 is 
possibly already out of the cache. With the new approach since we hand over a 
message right after validating it, we give the consumer a hot in cache message, 
which might improve the consumer processing enough to offset for the loss in 
CRC striding efficiency. Or it may not. It might just turn out that doing the 
CRC validation upfront is just a pure win since all the CRC tables will be in 
cache etc and striding access for the CRC math is worth an extra iteration of 
the ByteBuffer contents. But it is might still be more profitable to elide 
copies and prevent creation of objects by doing on the fly decoding and handing 
out indexes into the actual response ByteBuffer. This result might further be 
affected by how expensive the deserialization and processing of the message is. 
If the message is a bloated JSON encoded object that is de-serialized into a 
POJO and then processed really slowly then none of this will probably matter. 
On the other hand if the message is a compact and binary encoded and can be 
processed with minimal cache misses, this stuff might add up. My point is that 
basing the TODOs on the current profile may not be optimal because the profile 
is a massive consequence of the current layout and allocation patterns. Also 
the profile will give %s and we might be able to keep the same %s but just 
still reduce the overall time taken for the entire consumer processing cycle. 
Just to belabor the point even further, the current hash map implementations 
might suffer so many cache misses that they mask an underlying improvement 
opportunity for the data in the maps. Switching to compact primitive arrays 
based open hash maps might surface that opportunity again.

Is there a performance test that is used to keep track of the new Consumer's 
performance? If so maybe I can wrap that in a JMH suite and re-use that to test 
improvements?


was (Author: rzidane):
[~jkreps] Totally agree with you on the concerns with a re-write. I am sure 
I'll end up re-using most of the code, otherwise it will take too long in any 
case. But given this is just a prototype, I want the freedom to be able to make 
changes without being bound by the existing architecture  and class hierarchy 
of the client. Even if I do re-implement some of the parts I'll make sure that 
the client can (a) Do metadata requests so it can react to leaders moving etc. 
(b) Actually read from multiple topic/partitions spr

Re: Metrics package discussion

2015-03-28 Thread Jay Kreps
I think Joel's summary is good.

I'll add a few more points:

As discussed memory matter a lot if we want to be able to give percentiles
at the client or topic level, in which case we will have thousands of them.
If we just do histograms at the global level then it is not a concern. The
argument for doing histograms at the client and topic level is that
averages are often very misleading, especially for latency information or
other asymmetric distributions. Most people who care about this kind of
thing would say the same. If you are a user of a multi-tenant cluster then
you probably care a lot more about stats for your application or your topic
rather than the global, so it could be nice to have histograms for these. I
don't feel super strongly about this.

The ExponentiallyDecayingSample is internally
a ConcurrentSkipListMap. This seems to have an overhead of
about 64 bytes per entry. So a 1000 element sample is 64KB. For global
metrics this is fine, but for granular metrics not workable.

Two other issues I'm not sure about:

1. Is there a way to get metric descriptions into the coda hale JMX output?
One of the really nicest practical things about the new client metrics is
that if you look at them in jconsole each metric has an associated
description that explains what it means. I think this is a nice usability
thing--it is really hard to know what to make of the current metrics
without this kind of documentation and keeping separate docs up-to-date is
really hard and even if you do it most people won't find it.

2. I'm not clear if the sample decay in the histogram is actually the same
as for the other stats. It seems like it isn't but this would make
interpretation quite difficult. In other words if I have N metrics
including some Histograms some Meters, etc are all these measurements all
taken over the same time window? I actually think they are not, it looks
like there are different sampling methodologies across. So this means if
you have a dashboard that plots these things side by side the measurement
at a given point in time is not actually comparable across multiple stats.
Am I confused about this?

-Jay


On Fri, Mar 27, 2015 at 6:27 PM, Joel Koshy  wrote:

> For the samples: it will be at least double that estimate I think
> since the long array contains (eight byte) references to the actual
> longs, each of which also have some object overhead.
>
> Re: testing: actually, it looks like YM metrics does allow you to
> drop in your own clock:
>
> https://github.com/dropwizard/metrics/blob/master/metrics-core/src/main/java/com/codahale/metrics/Clock.java
>
> https://github.com/dropwizard/metrics/blob/master/metrics-core/src/main/java/com/codahale/metrics/Meter.java#L36
>
> Not sure if it was mentioned in this (or some recent) thread but a
> major motivation in the kafka-common metrics (KM) was absorbing API
> changes and even mbean naming conventions. For e.g., in the early
> stages of 0.8 we picked up YM metrics 3.x but collided with client
> apps at LinkedIn which were still on 2.x. We ended up changing our
> code to use 2.x in the end. Having our own metrics package makes us
> less vulnerable to these kinds of changes. The multiple version
> collision problem is obviously less of an issue with the broker but we
> are still exposed to possible metric changes in YM metrics.
>
> I'm wondering if we need to weigh too much toward the memory overheads
> of histograms in making a decision here simply because I don't think
> we have found them to be an extreme necessity for
> per-clientid/per-partition metrics and they are more critical for
> aggregate (global) metrics.
>
> So it seems the main benefits of switching to KM metrics are:
> - Less exposure to YM metrics changes
> - More control over the actual implementation. E.g., there is
>   considerable research on implementing approximate-but-good-enough
>   histograms/percentiles that we can try out
> - Differences (improvements) from YM metrics such as:
>   - hierarchical sensors
>   - integrated with quota enforcement
>   - mbeans can logically group attributes computed from different
> sensors. So there is logical grouping (as opposed to a separate
> mbean per sensor as is the case in YM metrics).
>
> The main disadvantages:
> - Everyone's graphs and alerts will break and need to be updated
> - Histogram support needs to be tested more/improved
>
> The first disadvantage is a big one but we aren't exactly immune to
> that if we stick with YM.
>
> BTW with KM metrics we should also provide reporters (graphite,
> ganglia) but we probably need to do this anyway since the new clients
> are on KM metrics.
>
> Thanks,
>
> Joel
>
> On Fri, Mar 27, 2015 at 06:48:48PM +, Aditya Auradkar wrote:
> > Adding to what Jay said.
> >
> > The library maintains 1k samples by default. The UniformSample has a
> long array so about 8k overhead per histogram. The
> ExponentiallyDecayingSample (which is what we use) has a 16 byte overhead
> per stored sample, s

[jira] [Commented] (KAFKA-2044) Support requests and responses from o.a.k.common in KafkaApis

2015-03-28 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14385591#comment-14385591
 ] 

Gwen Shapira commented on KAFKA-2044:
-

Thanks for the cleanup and commit.

I'll file subjiras for KAFKA-1927 and start submitting patches on those (others 
are welcome to join the effort).

> Support requests and responses from o.a.k.common in KafkaApis
> -
>
> Key: KAFKA-2044
> URL: https://issues.apache.org/jira/browse/KAFKA-2044
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Gwen Shapira
>Assignee: Gwen Shapira
> Fix For: 0.8.3
>
> Attachments: KAFKA-2044.patch, KAFKA-2044_2015-03-25_16:48:01.patch, 
> KAFKA-2044_2015-03-25_16:48:49.patch, KAFKA-2044_2015-03-25_16:53:05.patch, 
> KAFKA-2044_2015-03-25_18:49:24.patch, KAFKA-2044_2015-03-25_19:20:16.patch, 
> KAFKA-2044_2015-03-27_09:22:28.patch
>
>
> As groundwork for KIP-4 and for KAFKA-1927, we'll add some plumbing to 
> support handling of requests and responses from o.a.k.common in KafkaApis.
> This will allow us to add new Api calls just in o.a.k.conmon and to gradually 
> migrate existing requests and responses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-2064) Replace ConsumerMetadataRequest with org.apache.kafka.common.requests object

2015-03-28 Thread Gwen Shapira (JIRA)
Gwen Shapira created KAFKA-2064:
---

 Summary: Replace ConsumerMetadataRequest with  
org.apache.kafka.common.requests object
 Key: KAFKA-2064
 URL: https://issues.apache.org/jira/browse/KAFKA-2064
 Project: Kafka
  Issue Type: Sub-task
Reporter: Gwen Shapira


Replace ConsumerMetadataRequest with  org.apache.kafka.common.requests object



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2064) Replace ConsumerMetadataRequest and Response with org.apache.kafka.common.requests objects

2015-03-28 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira updated KAFKA-2064:

Description: Replace ConsumerMetadataRequest and response with  
org.apache.kafka.common.requests objects  (was: Replace ConsumerMetadataRequest 
with  org.apache.kafka.common.requests object)

> Replace ConsumerMetadataRequest and Response with  
> org.apache.kafka.common.requests objects
> ---
>
> Key: KAFKA-2064
> URL: https://issues.apache.org/jira/browse/KAFKA-2064
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Gwen Shapira
> Fix For: 0.8.3
>
>
> Replace ConsumerMetadataRequest and response with  
> org.apache.kafka.common.requests objects



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2064) Replace ConsumerMetadataRequest and Response with org.apache.kafka.common.requests objects

2015-03-28 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira updated KAFKA-2064:

Summary: Replace ConsumerMetadataRequest and Response with  
org.apache.kafka.common.requests objects  (was: Replace ConsumerMetadataRequest 
with  org.apache.kafka.common.requests object)

> Replace ConsumerMetadataRequest and Response with  
> org.apache.kafka.common.requests objects
> ---
>
> Key: KAFKA-2064
> URL: https://issues.apache.org/jira/browse/KAFKA-2064
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Gwen Shapira
> Fix For: 0.8.3
>
>
> Replace ConsumerMetadataRequest with  org.apache.kafka.common.requests object



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-2065) Add ControlledShutdown to org.apache.kafka.common.requests and replace current use in core module

2015-03-28 Thread Gwen Shapira (JIRA)
Gwen Shapira created KAFKA-2065:
---

 Summary: Add ControlledShutdown to  
org.apache.kafka.common.requests and replace current use in core module
 Key: KAFKA-2065
 URL: https://issues.apache.org/jira/browse/KAFKA-2065
 Project: Kafka
  Issue Type: Sub-task
Reporter: Gwen Shapira


Add ControlledShutdown to  org.apache.kafka.common.requests and replace current 
use in core module



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-2066) Replace FetchRequest / FetchResponse with their org.apache.kafka.common.requests equivalents

2015-03-28 Thread Gwen Shapira (JIRA)
Gwen Shapira created KAFKA-2066:
---

 Summary: Replace FetchRequest / FetchResponse with their 
org.apache.kafka.common.requests equivalents
 Key: KAFKA-2066
 URL: https://issues.apache.org/jira/browse/KAFKA-2066
 Project: Kafka
  Issue Type: Sub-task
Reporter: Gwen Shapira


Replace FetchRequest / FetchResponse with their 
org.apache.kafka.common.requests equivalents.

Note that they can't be completely removed until we deprecate the 
SimpleConsumer API (and it will require very careful patchwork for the places 
where core modules actually use the SimpleConsumer API).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-2067) Add LeaderAndISR request/response to org.apache.kafka.common.requests and replace usage in core module

2015-03-28 Thread Gwen Shapira (JIRA)
Gwen Shapira created KAFKA-2067:
---

 Summary: Add LeaderAndISR request/response to 
org.apache.kafka.common.requests and replace usage in core module
 Key: KAFKA-2067
 URL: https://issues.apache.org/jira/browse/KAFKA-2067
 Project: Kafka
  Issue Type: Sub-task
Reporter: Gwen Shapira


Add LeaderAndISR request/response to org.apache.kafka.common.requests and 
replace usage in core module.

Note that this will require adding a bunch of new objects to o.a.k.common - 
LeaderAndISR, LeaderISRAndEpoch and possibly others.

It may be nice to have a scala implicit to translate those objects from their 
old (core) implementation to the o.a.k.common implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-2068) Replace OffsetCommit Request/Response with org.apache.kafka.common.requests equivalent

2015-03-28 Thread Gwen Shapira (JIRA)
Gwen Shapira created KAFKA-2068:
---

 Summary: Replace OffsetCommit Request/Response with  
org.apache.kafka.common.requests  equivalent
 Key: KAFKA-2068
 URL: https://issues.apache.org/jira/browse/KAFKA-2068
 Project: Kafka
  Issue Type: Sub-task
Reporter: Gwen Shapira


Replace OffsetCommit Request/Response with  org.apache.kafka.common.requests  
equivalent



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-2069) Replace OffsetFetch request/response with their org.apache.kafka.common.requests equivalent

2015-03-28 Thread Gwen Shapira (JIRA)
Gwen Shapira created KAFKA-2069:
---

 Summary: Replace OffsetFetch request/response with their   
org.apache.kafka.common.requests  equivalent
 Key: KAFKA-2069
 URL: https://issues.apache.org/jira/browse/KAFKA-2069
 Project: Kafka
  Issue Type: Sub-task
Reporter: Gwen Shapira


Replace OffsetFetch request/response with their  
org.apache.kafka.common.requests  equivalent



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-2070) Replace OffsetRequest/response with ListOffsetRequest/response from org.apache.kafka.common.requests

2015-03-28 Thread Gwen Shapira (JIRA)
Gwen Shapira created KAFKA-2070:
---

 Summary: Replace OffsetRequest/response with 
ListOffsetRequest/response from org.apache.kafka.common.requests
 Key: KAFKA-2070
 URL: https://issues.apache.org/jira/browse/KAFKA-2070
 Project: Kafka
  Issue Type: Sub-task
Reporter: Gwen Shapira


Replace OffsetRequest/response with ListOffsetRequest/response from 
org.apache.kafka.common.requests



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-2071) Replace Produce Request/Response with their org.apache.kafka.common.requests equivalents

2015-03-28 Thread Gwen Shapira (JIRA)
Gwen Shapira created KAFKA-2071:
---

 Summary: Replace Produce Request/Response with their 
org.apache.kafka.common.requests equivalents
 Key: KAFKA-2071
 URL: https://issues.apache.org/jira/browse/KAFKA-2071
 Project: Kafka
  Issue Type: Sub-task
Reporter: Gwen Shapira


Replace Produce Request/Response with their org.apache.kafka.common.requests 
equivalents



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2066) Replace FetchRequest / FetchResponse with their org.apache.kafka.common.requests equivalents

2015-03-28 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira updated KAFKA-2066:

Description: 
Replace FetchRequest / FetchResponse with their 
org.apache.kafka.common.requests equivalents.

Note that they can't be completely removed until we deprecate the 
SimpleConsumer API (and it will require very careful patchwork for the places 
where core modules actually use the SimpleConsumer API).

This also requires a solution on how to stream from memory-mapped files 
(similar to what existing code does with FileMessageSet. 

  was:
Replace FetchRequest / FetchResponse with their 
org.apache.kafka.common.requests equivalents.

Note that they can't be completely removed until we deprecate the 
SimpleConsumer API (and it will require very careful patchwork for the places 
where core modules actually use the SimpleConsumer API).


> Replace FetchRequest / FetchResponse with their 
> org.apache.kafka.common.requests equivalents
> 
>
> Key: KAFKA-2066
> URL: https://issues.apache.org/jira/browse/KAFKA-2066
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Gwen Shapira
> Fix For: 0.8.3
>
>
> Replace FetchRequest / FetchResponse with their 
> org.apache.kafka.common.requests equivalents.
> Note that they can't be completely removed until we deprecate the 
> SimpleConsumer API (and it will require very careful patchwork for the places 
> where core modules actually use the SimpleConsumer API).
> This also requires a solution on how to stream from memory-mapped files 
> (similar to what existing code does with FileMessageSet. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-2072) Add StopReplica request/response to o.a.k.common.requests and replace the usage in core module

2015-03-28 Thread Gwen Shapira (JIRA)
Gwen Shapira created KAFKA-2072:
---

 Summary: Add StopReplica request/response to o.a.k.common.requests 
and replace the usage in core module
 Key: KAFKA-2072
 URL: https://issues.apache.org/jira/browse/KAFKA-2072
 Project: Kafka
  Issue Type: Sub-task
Reporter: Gwen Shapira


Add StopReplica request/response to o.a.k.common.requests and replace the usage 
in core module



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-2073) Replace TopicMetadata request/response with o.a.k.requests.metadata

2015-03-28 Thread Gwen Shapira (JIRA)
Gwen Shapira created KAFKA-2073:
---

 Summary: Replace TopicMetadata request/response with 
o.a.k.requests.metadata
 Key: KAFKA-2073
 URL: https://issues.apache.org/jira/browse/KAFKA-2073
 Project: Kafka
  Issue Type: Sub-task
Reporter: Gwen Shapira


Replace TopicMetadata request/response with o.a.k.requests.metadata.

Note, this is more challenging that it appears because while the wire protocol 
is identical, the objects are completely different.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-2074) Add UpdateMetadata request/response to o.a.k.common.requests and replace its use in core module

2015-03-28 Thread Gwen Shapira (JIRA)
Gwen Shapira created KAFKA-2074:
---

 Summary: Add UpdateMetadata request/response to 
o.a.k.common.requests and replace its use in core module
 Key: KAFKA-2074
 URL: https://issues.apache.org/jira/browse/KAFKA-2074
 Project: Kafka
  Issue Type: Sub-task
Reporter: Gwen Shapira


Add UpdateMetadata request/response to o.a.k.common.requests and replace its 
use in core module



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-2075) Validate that all kafka.api requests has been removed and clean up compatibility code

2015-03-28 Thread Gwen Shapira (JIRA)
Gwen Shapira created KAFKA-2075:
---

 Summary: Validate that all kafka.api requests has been removed and 
clean up compatibility code
 Key: KAFKA-2075
 URL: https://issues.apache.org/jira/browse/KAFKA-2075
 Project: Kafka
  Issue Type: Sub-task
Reporter: Gwen Shapira


Once we finished all other subtasks - the old kafka.api requests/responses 
shouldn't be used anywhere.

We need to validate that the classes are indeed gone, remove the unittests for 
serializing/deserializing them and clean up the compatibility code added in 
KAFKA-2044.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-11- Authorization design for kafka security

2015-03-28 Thread Gwen Shapira
Preparing for Tuesday meeting, I went over the KIP :)

First, Parth did an amazing job, the KIP is fantastic - detailed and
readable. Thank you!

Second, I have a lng list of questions :) No objections, just some
things I'm unclear on and random minor comments. In general, I like
the design, I just feel I'm missing parts of the picture.

1. "Yes, Create topic will have an optional acls, the output of
describe will display owner and acls and alter topic will allow to
modify the acls.”  - will be nice to see what the CLI will look like.

2. I like the addition of Topic owner. We made the mistake of
forgetting about it when adding authorization to Sqoop2. We probably
want to add “chown” command to the topic commands.

3. "Kafka server will read "authorizer.class” config value at startup
time, create an instance of the specified class and call initialize
method."
We’ll need to validate that users specify only one of those.

4. "One added assumption is that on non-secure connections the session
will have principal set to an object whose name() method will return
"Anonymous”."
Can we keep DrWho? :)

5. "For cluster actions that do not apply to a specific topic like
CREATE we have 2 options. We can either add a broker config called
broker.acls which will point to a json file. This file will be
available on all broker hosts and authorizer will read the acls on
initialization and keep refreshing it every X minutes. Any changes
will require re-distribution of the acl json file. Alternatively we
can add a zookeeper path /brokers/acls and store the acl json as data.
Authorizer can refresh the acl from json every X minutes. In absence
of broker acls the authorizer will fail open, in other words it will
allow all users from all hosts to perform all cluster actions”
I prefer a file to ZK - since thats where we store all use-defined
configurations for now. Everyone knows how to secure a file system :)

6. "When an Acl is missing , this implementation will always fail open
for backward compatibility. “ <- agree, but we need to document that
this makes the default authorizer non-secure

7. "If the value of authorizer.class.name is null, in secure mode the
cluster will fail with ConfigException. In non secure mode in absence
of config value forauthorizer.class.name the server will allow all
requests to all topics that , even if the topic has configured acls”
<- I don’t think Kafka has “secure mode” - it can support SSL and
plaintext (un-authenticated) on two different ports simultaneously. I
don’t object to adding such configuration, but we need to decide
exactly what it means.

8. "Currently all topic creation/modification/deletion actions are
performed using KafkaAdminUtil which mostly interacts directly with
zookeeper instead of forwarding requests to a broker host. Given all
the code is executed on client side there is no easy way to perform
authorization. “ - since we are adding the admin protocol requests in
KIP-4, perhaps addressing those makes sense.

9. I didn’t see a specification of what is “resource”, I assume its an
enum with things like Topic and… ?

10. I’m also unclear on where and how “PermissionType” will be used
and what does “DENY takes precedence” mean here.

11. "What does acls on zookeeper node look like given all our admin
APIs are currently performed directly from client?” <- “secure mode”
Kafka will need to set ACLs on ZK (using ZK’s model of ACLs) and both
Kafka and everyone else will need to use them (this is limited to
Kerberos authentication AFAIK.)

12. "Do we want to support group acls as part of this authorizer? Do
we want to support principal to local user mapping? If yes we need to
add plugins for UserToGroupMapper and PrincipalToUserMapper.” <-
Sentry uses Groups for authorizing, so we need to support that. I
figured that as long as the API specifies Principal, it typically
contains both user and group, so nothing else is needed. Did I miss
anything?

13. It looks like the Authorizer stores the ACLs and not Kafka. So we
need an API for Kafka to notify Authorizer when a topic is added and
when ACLs are modified, right? I didn’t see that.

14. Are we going to have any API for Kafka to give out the ACLs on a
topic? Or we leave this to the Authorizer?



On Wed, Mar 25, 2015 at 9:26 PM, Neha Narkhede  wrote:
> Parth,
>
> We can make some 15 mins or so to discuss this at the next KIP hangout.
>
> Thanks,
> Neha
>
> On Wed, Mar 25, 2015 at 1:07 PM, Parth Brahmbhatt <
> pbrahmbh...@hortonworks.com> wrote:
>
>> Hi all,
>>
>> I have modified the KIP to reflect the recent change request from the
>> reviewers. I have been working on the code and I have the server side code
>> for authorization ready. I am now modifying the command line utilities. I
>> would really appreciate if some of the committers can spend sometime to
>> review the KIP so we can make progress on this.
>>
>> Thanks
>> Parth
>>
>> On 3/18/15, 2:20 PM, "Michael Herstine" 
>> wrote:
>>
>> >Hi Parth,
>> >
>> >Thanks! A few questio

Re: [DISCUSSION] Keep docs updated per jira

2015-03-28 Thread Gwen Shapira
On Thu, Mar 26, 2015 at 9:46 PM, Jay Kreps  wrote:
> The reason the docs are in svn is that when we were setting up the site
> apache required that to publish doc changes. Two possible fixes:
> 1. Follow up with infra to see if they have git integration working yet
> 2. Move to a model where doc "source" is kept in the main git and we use
> jenkyl or something like that to generate result html (i.e. with things
> like headers) and then check that in to svn to publish it.
>
> The second item would have the advantage of not needing to configure apache
> includes to see the docs, but would likely trade it for jenkyl setup stuff.
> Jenkyl might actually fix a lot of the repetitive stuff in the docs today
> (e.g. generating section numbers, adding  tags, etc).

What we do at other projects that I'm familiar with (Sqoop, Flume) is
that we manage the docs "source" (i.e. asciidoc or similar) in git.
When its time to release a version (i.e. after a successful vote on
RC), we build the docs, manually copy to the SVN repo and push (or
whatever the SVN equivalent...). Its a bit of a manual pain, but its
only few times a year. This is also a good opportunity to upload
updated javadocs, docs generated from ConfigDef and KafkaMetrics, etc.

Gwen


>
> -Jay
>
> On Thu, Mar 26, 2015 at 8:22 PM, Jiangjie Qin 
> wrote:
>
>>
>>
>> On 3/26/15, 7:00 PM, "Neha Narkhede"  wrote:
>>
>> >>
>> >> Much much easier to do this if the docs are in git and can be reviewed
>> >>and
>> >> committed / reverted with the code (transactions makes synchronization
>> >> easier...).
>> +1 on this, too!
>> >
>> >
>> >Huge +1.
>> >
>> >On Thu, Mar 26, 2015 at 6:54 PM, Joel Koshy  wrote:
>> >
>> >> +1
>> >>
>> >> It is indeed too easy to forget and realize only much later that a
>> >> jira needed a doc update. So getting into the habit of asking "did you
>> >> update the docs" as part of review will definitely help.
>> >>
>> >> On Thu, Mar 26, 2015 at 06:36:43PM -0700, Gwen Shapira wrote:
>> >> > I strongly support the goal of keeping docs and code in sync.
>> >> >
>> >> > Much much easier to do this if the docs are in git and can be reviewed
>> >> and
>> >> > committed / reverted with the code (transactions makes synchronization
>> >> > easier...).
>> >> >
>> >> > This will also allow us to:
>> >> > 1. Include the docs in the bits we release
>> >> > 2. On release, update the website with the docs from the specific
>> >>branch
>> >> > that was just released
>> >> > 3. Hook our build to ReadTheDocs and update the "trunk" docs with
>> >>every
>> >> > commit
>> >> >
>> >> >
>> >> > Tons of Apache projects do this already and having reviews enforce the
>> >> "did
>> >> > you update the docs" before committing is the best way to guarantee
>> >> updated
>> >> > docs.
>> >> >
>> >> > Gwen
>> >> >
>> >> > On Thu, Mar 26, 2015 at 6:27 PM, Jun Rao  wrote:
>> >> >
>> >> > > Hi, Everyone,
>> >> > >
>> >> > > Quite a few jiras these days require documentation changes (e.g.,
>> >>wire
>> >> > > protocol, ZK layout, configs, jmx, etc). Historically, we have been
>> >> > > updating the documentation just before we do a release. The issue is
>> >> that
>> >> > > some of the changes will be missed since they were done a while
>> >>back.
>> >> > > Another way to do that is to keep the docs updated as we complete
>> >>each
>> >> > > jira. Currently, our documentations are in the following places.
>> >> > >
>> >> > > wire protocol:
>> >> > >
>> >> > >
>> >>
>> >>
>> https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Pr
>> >>otocol
>> >> > > ZK layout:
>> >> > >
>> >> > >
>> >>
>> >>
>> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+data+structures+i
>> >>n+Zookeeper
>> >> > > configs/jmx: https://svn.apache.org/repos/asf/kafka/site/083
>> >> > >
>> >> > > We probably don't need to update configs already ported to ConfigDef
>> >> since
>> >> > > they can be generated automatically. However, for the rest of the
>> >>doc
>> >> > > related changes, keeping they updated per jira seems a better
>> >>approach.
>> >> > > What do people think?
>> >> > >
>> >> > > Thanks,
>> >> > >
>> >> > > Jun
>> >> > >
>> >>
>> >>
>> >
>> >
>> >--
>> >Thanks,
>> >Neha
>>
>>