[jira] [Commented] (KAFKA-4107) Support offset reset capability in Kafka Connect

2017-04-05 Thread JIRA

[ 
https://issues.apache.org/jira/browse/KAFKA-4107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15956446#comment-15956446
 ] 

Sönke Liebau commented on KAFKA-4107:
-

Is anybody actively working on this? If not I'd be willing to come up with a 
proposal for this functionality to get the ball rolling.

> Support offset reset capability in Kafka Connect
> 
>
> Key: KAFKA-4107
> URL: https://issues.apache.org/jira/browse/KAFKA-4107
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Jason Gustafson
>
> It would be useful in some cases to be able to reset connector offsets. For 
> example, if a topic in Kafka corresponding to a source database is 
> accidentally deleted (or deleted because of corrupt data), an administrator 
> may want to reset offsets and reproduce the log from the beginning. It may 
> also be useful to have support for overriding offsets, but that seems like a 
> less likely use case.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4901) Make ProduceRequest thread-safe

2017-04-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15956565#comment-15956565
 ] 

ASF GitHub Bot commented on KAFKA-4901:
---

GitHub user ijuma opened a pull request:

https://github.com/apache/kafka/pull/2810

KAFKA-4901: Make ProduceRequest thread-safe

A more conservative version of the change for the 0.10.2
branch.

Trunk commit: 1659ca1773596b.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ijuma/kafka 
kafka-4901-produce-request-thread-safety-0-10-2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2810.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2810


commit 91b0aa305e368b84e506d2ab8bc92d1a3a031d7c
Author: Ismael Juma 
Date:   2017-03-16T16:13:58Z

KAFKA-4901: Make ProduceRequest thread-safe

A more conservative version of the change for the 0.10.2
branch.




> Make ProduceRequest thread-safe
> ---
>
> Key: KAFKA-4901
> URL: https://issues.apache.org/jira/browse/KAFKA-4901
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ismael Juma
>Assignee: Ismael Juma
> Fix For: 0.11.0.0, 0.10.2.1
>
>
> If request logging is enabled, ProduceRequest can be accessed
> and mutated concurrently from a network thread (which calls
> toString) and the request handler thread (which calls
> clearPartitionRecords()).
> That can lead to a ConcurrentModificationException when iterating
> the partitionRecords map.
> The underlying thread-safety issue has existed since the server
> started using the Java implementation of ProduceRequest in 0.10.0.
> However, we were incorrectly not clearing the underlying struct until
> 0.10.2, so toString itself was thread-safe until that change. In 0.10.2,
> toString is no longer thread-safe and we could potentially see a
> NullPointerException given the right set of interleavings between
> toString and clearPartitionRecords although we haven't seen that
> happen yet.
> In trunk, we changed the requests to have a toStruct method
> instead of creating a struct in the constructor and toString was
> no longer printing the contents of the Struct. This accidentally
> fixed the race condition, but it meant that request logging was less
> useful.
> A couple of days ago, AbstractRequest.toString was changed to
> print the contents of the request by calling toStruct().toString()
> and reintroduced the race condition. The impact is more visible
> because we iterate over a HashMap, which proactively
> checks for concurrent modification (unlike arrays).
> We will need a separate PR for 0.10.2.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #2810: KAFKA-4901: Make ProduceRequest thread-safe

2017-04-05 Thread ijuma
GitHub user ijuma opened a pull request:

https://github.com/apache/kafka/pull/2810

KAFKA-4901: Make ProduceRequest thread-safe

A more conservative version of the change for the 0.10.2
branch.

Trunk commit: 1659ca1773596b.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ijuma/kafka 
kafka-4901-produce-request-thread-safety-0-10-2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2810.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2810


commit 91b0aa305e368b84e506d2ab8bc92d1a3a031d7c
Author: Ismael Juma 
Date:   2017-03-16T16:13:58Z

KAFKA-4901: Make ProduceRequest thread-safe

A more conservative version of the change for the 0.10.2
branch.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #2777: HOTFIX: WindowedStreamPartitioner does not provide...

2017-04-05 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2777


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #2781: MINOR: fix flaky StateDirectoryTest

2017-04-05 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2781


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [VOTE] KIP-122: Add Reset Consumer Group Offsets tooling

2017-04-05 Thread Jorge Esteban Quilcate Otoya
Thanks Ismael! Response inline.

El mar., 4 abr. 2017 a las 17:43, Ismael Juma ()
escribió:

Sorry for the delay Jorge. Responses inline.

On Thu, Mar 23, 2017 at 5:56 PM, Jorge Esteban Quilcate Otoya <
quilcate.jo...@gmail.com> wrote:

> @Ismael, thanks for your feedback!
> 1. Good point. I will add optional support for timezone as part of the
> datetime input. But, when datetime is without timezone, would it be more
> consistent to get the timezone from the cluster first and then reset based
> on that value? Not sure if it is possible to get that info from the
> cluster. But, in case that's not available, I could add a note to advise
> that in case timezone is not specified the tool will get that value from
> the client and it would be user's responsibility to validate that is
> aligned with the server.
>

There's no way to get such data from the Cluster today. It's relatively
common for servers to use UTC as their timezone though. Is there any value
in using the client timezone? Today's apps typically have data from all
over and what are the chances that the time zone from where the client is
running is the correct one?


You're right, make sense to use UTC by default and accept Timezone as part
of the input value. I updated the KIP.



> 2. Happy to add it to the KIP.
>

OK.


> 3. This was part of the discussion thread, we end up with `shift-by` to
> avoid adding `offset` to each case and make it a bit more consistent.
>

OK.


> 4. I could remove CURRENT-OFFSET and CURRENT-LAG from the output, and
leave
> it as part of `describe` operation, if that's better.
>

It seems better to me as one would hope people would look at describe to
find the current values.

Done, KIP updated.


> 5. Agree. At the beginning we consider `shift-plus` and `shift-minus`, but
> agree to join them in one option and accept +/- as input. Maybe that's a
> better option?
>

Not sure, maybe it's fine as it is. I can't think of anything better, at
least.


Agreed, I also think is good enough as it is now.



Ismael


Jorge.


[jira] [Updated] (KAFKA-4935) Disable record level crc checks on the consumer with the KIP-98 message format

2017-04-05 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-4935:
---
Fix Version/s: 0.11.0.0

> Disable record level crc checks on the consumer with the KIP-98 message format
> --
>
> Key: KAFKA-4935
> URL: https://issues.apache.org/jira/browse/KAFKA-4935
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Apurva Mehta
> Fix For: 0.11.0.0
>
>
> With the new message format proposed in KIP-98, the record level CRC has been 
> moved to the the batch header. The consumer record API still allows retrieval 
> of CRC per record, and in this case we compute the CRC on the fly. This 
> degrades performance, and also such a computation becomes unnecessary with 
> the batch level CRC.
> We can address this by deprecating the record level CRC api. We can also work 
> around this by disabling record level checks when the check crc config is set 
> to false.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-4935) Consider disabling record level CRC checks for message format V2

2017-04-05 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-4935:
---
Summary: Consider disabling record level CRC checks for message format V2  
(was: Disable record level crc checks on the consumer with the KIP-98 message 
format)

> Consider disabling record level CRC checks for message format V2
> 
>
> Key: KAFKA-4935
> URL: https://issues.apache.org/jira/browse/KAFKA-4935
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Apurva Mehta
> Fix For: 0.11.0.0
>
>
> With the new message format proposed in KIP-98, the record level CRC has been 
> moved to the the batch header. The consumer record API still allows retrieval 
> of CRC per record, and in this case we compute the CRC on the fly. This 
> degrades performance, and also such a computation becomes unnecessary with 
> the batch level CRC.
> We can address this by deprecating the record level CRC api. We can also work 
> around this by disabling record level checks when the check crc config is set 
> to false.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-4935) Consider disabling record level CRC checks for message format V2

2017-04-05 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-4935:
---
Description: 
With the new message format proposed in KIP-98, the record level CRC has been 
moved to the the batch header.

Because we expose the record-level CRC in `RecordMetadata` and 
`ConsumerRecord`, we currently compute it eagerly based on the key, value and 
timestamp even though these methods are rarely used. Ideally, we'd deprecate 
the relevant methods in `RecordMetadata` and `ConsumerRecord` while making the 
CRC computation lazy. This seems pretty hard to achieve in the Producer without 
increasing memory retention, but it may be possible to do in the Consumer.

An alternative option is to return the batch CRC from the relevant methods.

  was:
With the new message format proposed in KIP-98, the record level CRC has been 
moved to the the batch header. The consumer record API still allows retrieval 
of CRC per record, and in this case we compute the CRC on the fly. This 
degrades performance, and also such a computation becomes unnecessary with the 
batch level CRC.

We can address this by deprecating the record level CRC api. We can also work 
around this by disabling record level checks when the check crc config is set 
to false.


> Consider disabling record level CRC checks for message format V2
> 
>
> Key: KAFKA-4935
> URL: https://issues.apache.org/jira/browse/KAFKA-4935
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Apurva Mehta
> Fix For: 0.11.0.0
>
>
> With the new message format proposed in KIP-98, the record level CRC has been 
> moved to the the batch header.
> Because we expose the record-level CRC in `RecordMetadata` and 
> `ConsumerRecord`, we currently compute it eagerly based on the key, value and 
> timestamp even though these methods are rarely used. Ideally, we'd deprecate 
> the relevant methods in `RecordMetadata` and `ConsumerRecord` while making 
> the CRC computation lazy. This seems pretty hard to achieve in the Producer 
> without increasing memory retention, but it may be possible to do in the 
> Consumer.
> An alternative option is to return the batch CRC from the relevant methods.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-4935) Consider disabling record level CRC checks for message format V2

2017-04-05 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-4935:
---
Issue Type: Sub-task  (was: Improvement)
Parent: KAFKA-4815

> Consider disabling record level CRC checks for message format V2
> 
>
> Key: KAFKA-4935
> URL: https://issues.apache.org/jira/browse/KAFKA-4935
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Apurva Mehta
> Fix For: 0.11.0.0
>
>
> With the new message format proposed in KIP-98, the record level CRC has been 
> moved to the the batch header.
> Because we expose the record-level CRC in `RecordMetadata` and 
> `ConsumerRecord`, we currently compute it eagerly based on the key, value and 
> timestamp even though these methods are rarely used. Ideally, we'd deprecate 
> the relevant methods in `RecordMetadata` and `ConsumerRecord` while making 
> the CRC computation lazy. This seems pretty hard to achieve in the Producer 
> without increasing memory retention, but it may be possible to do in the 
> Consumer.
> An alternative option is to return the batch CRC from the relevant methods.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #2660: MINOR: Make ConfigDef safer by not using empty str...

2017-04-05 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2660


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #2801: MINOR: Fix multiple org.apache.kafka.streams.Kafka...

2017-04-05 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2801


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #2811: MINOR: update javadoc on ReadOnlyWindowStore

2017-04-05 Thread dguy
GitHub user dguy opened a pull request:

https://github.com/apache/kafka/pull/2811

MINOR: update javadoc on ReadOnlyWindowStore

Highlight that the range in `fetch` is inclusive of both `timeFrom` and 
`timeTo`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dguy/kafka minor-window-fetch-java-doc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2811.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2811


commit 86353dce8a5217ba4ba4ae4b863578803232b3bc
Author: Damian Guy 
Date:   2017-04-05T10:42:17Z

update javadoc on readonly window store




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (KAFKA-4913) creating a window store with one segment throws division by zero error

2017-04-05 Thread Damian Guy (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damian Guy updated KAFKA-4913:
--
Fix Version/s: (was: 0.10.2.1)
   0.11.0.0

> creating a window store with one segment throws division by zero error
> --
>
> Key: KAFKA-4913
> URL: https://issues.apache.org/jira/browse/KAFKA-4913
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.0
>Reporter: Xavier Léauté
>Assignee: Damian Guy
> Fix For: 0.11.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-4913) creating a window store with one segment throws division by zero error

2017-04-05 Thread Damian Guy (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damian Guy updated KAFKA-4913:
--
Priority: Major  (was: Blocker)

> creating a window store with one segment throws division by zero error
> --
>
> Key: KAFKA-4913
> URL: https://issues.apache.org/jira/browse/KAFKA-4913
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.0
>Reporter: Xavier Léauté
>Assignee: Damian Guy
> Fix For: 0.11.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Jenkins build is back to normal : kafka-trunk-jdk7 #2073

2017-04-05 Thread Apache Jenkins Server
See 




Jenkins build is back to normal : kafka-0.10.2-jdk7 #121

2017-04-05 Thread Apache Jenkins Server
See 




Re: [DISCUSS] KIP-138: Change punctuate semantics

2017-04-05 Thread Tianji Li
Hi Jay,

The hybrid solution is exactly what I expect and need for our use cases
when dealing with telecom data.

Thanks
Tianji

On Wed, Apr 5, 2017 at 12:01 AM, Jay Kreps  wrote:

> Hey guys,
>
> One thing I've always found super important for this kind of design work is
> to do a really good job of cataloging the landscape of use cases and how
> prevalent each one is. By that I mean not just listing lots of uses, but
> also grouping them into categories that functionally need the same thing.
> In the absence of this it is very hard to reason about design proposals.
> From the proposals so far I think we have a lot of discussion around
> possible apis, but less around what the user needs for different use cases
> and how they would implement that using the api.
>
> Here is an example:
> You aggregate click and impression data for a reddit like site. Every ten
> minutes you want to output a ranked list of the top 10 articles ranked by
> clicks/impressions for each geographical area. I want to be able run this
> in steady state as well as rerun to regenerate results (or catch up if it
> crashes).
>
> There are a couple of tricky things that seem to make this hard with either
> of the options proposed:
> 1. If I emit this data using event time I have the problem described where
> a geographical region with no new clicks or impressions will fail to output
> results.
> 2. If I emit this data using system time I have the problem that when
> reprocessing data my window may not be ten minutes but 10 hours if my
> processing is very fast so it dramatically changes the output.
>
> Maybe a hybrid solution works: I window by event time but trigger results
> by system time for windows that have updated? Not really sure the details
> of making that work. Does that work? Are there concrete examples where you
> actually want the current behavior?
>
> -Jay
>
>
> On Tue, Apr 4, 2017 at 8:32 PM, Arun Mathew 
> wrote:
>
> > Hi All,
> >
> > Thanks for the KIP. We were also in need of a mechanism to trigger
> > punctuate in the absence of events.
> >
> > As I described in [
> > https://issues.apache.org/jira/browse/KAFKA-3514?
> > focusedCommentId=15926036&page=com.atlassian.jira.
> > plugin.system.issuetabpanels:comment-tabpanel#comment-15926036
> > ],
> >
> >- Our approached involved using the event time by default.
> >- The method to check if there is any punctuate ready in the
> >PunctuationQueue is triggered via the any event received by the stream
> >tread, or at the polling intervals in the absence of any events.
> >- When we create Punctuate objects (which contains the next event time
> >for punctuation and interval), we also record the creation time
> (system
> >time).
> >- While checking for maturity of Punctuate Schedule by mayBePunctuate
> >method, we also check if the system clock has elapsed the punctuate
> >interval since the schedule creation time.
> >- In the absence of any event, or in the absence of any event for one
> >topic in the partition group assigned to the stream task, the system
> > time
> >will elapse the interval and we trigger a punctuate using the expected
> >punctuation event time.
> >- we then create the next punctuation schedule as punctuation event
> time
> >+ punctuation interval, [again recording the system time of creation
> of
> > the
> >schedule].
> >
> > We call this a Hybrid Punctuate. Of course, this approach has pros and
> > cons.
> > Pros
> >
> >- Punctuates will happen in  time duration at max
> in
> >terms of system time.
> >- The semantics as a whole continues to revolve around event time.
> >- We can use the old data [old timestamps] to rerun any experiments or
> >tests.
> >
> > Cons
> >
> >- In case the   is not a time duration [say
> logical
> >time/event count], then the approach might not be meaningful.
> >- In case there is a case where we have to wait for an actual event
> from
> >a low event rate partition in the partition group, this approach will
> > jump
> >the gun.
> >- in case the event processing cannot catch up with the event rate and
> >the expected timestamp events gets queued for long time, this approach
> >might jump the gun.
> >
> > I believe the above approach and discussion goes close to the approach A.
> >
> > ---
> >
> > I like the idea of having an even count based punctuate.
> >
> > ---
> >
> > I agree with the discussion around approach C, that we should provide the
> > user with the option to choose system time or event time based
> punctuates.
> > But I believe that the user predominantly wants to use event time while
> not
> > missing out on regular punctuates due to event delays or event absences.
> > Hence a complex punctuate option as Matthias mentioned (quoted below)
> would
> > be most apt.
> >
> > "- We might want to add "complex" schedules later on (like, punctuate on
> > every 10 seconds event-time or 60 sec

Build failed in Jenkins: kafka-trunk-jdk7 #2074

2017-04-05 Thread Apache Jenkins Server
See 


Changes:

[ismael] MINOR: Fix flaky StateDirectoryTest

[ismael] MINOR: Make ConfigDef safer by not using empty string for

[ismael] MINOR: Fix multiple KafkaStreams.StreamStateListener being instantiated

--
[...truncated 337.84 KB...]

kafka.log.LogTest > shouldDeleteTimeBasedSegmentsReadyToBeDeleted STARTED

kafka.log.LogTest > shouldDeleteTimeBasedSegmentsReadyToBeDeleted PASSED

kafka.log.LogTest > testReadWithTooSmallMaxLength STARTED

kafka.log.LogTest > testReadWithTooSmallMaxLength PASSED

kafka.log.LogTest > testOverCompactedLogRecovery STARTED

kafka.log.LogTest > testOverCompactedLogRecovery PASSED

kafka.log.LogTest > testBogusIndexSegmentsAreRemoved STARTED

kafka.log.LogTest > testBogusIndexSegmentsAreRemoved PASSED

kafka.log.LogTest > testCompressedMessages STARTED

kafka.log.LogTest > testCompressedMessages PASSED

kafka.log.LogTest > testAppendMessageWithNullPayload STARTED

kafka.log.LogTest > testAppendMessageWithNullPayload PASSED

kafka.log.LogTest > testCorruptLog STARTED

kafka.log.LogTest > testCorruptLog PASSED

kafka.log.LogTest > testLogRecoversToCorrectOffset STARTED

kafka.log.LogTest > testLogRecoversToCorrectOffset PASSED

kafka.log.LogTest > testReopenThenTruncate STARTED

kafka.log.LogTest > testReopenThenTruncate PASSED

kafka.log.LogTest > testOldProducerEpoch STARTED

kafka.log.LogTest > testOldProducerEpoch PASSED

kafka.log.LogTest > testParseTopicPartitionNameForMissingPartition STARTED

kafka.log.LogTest > testParseTopicPartitionNameForMissingPartition PASSED

kafka.log.LogTest > testParseTopicPartitionNameForEmptyName STARTED

kafka.log.LogTest > testParseTopicPartitionNameForEmptyName PASSED

kafka.log.LogTest > testOpenDeletesObsoleteFiles STARTED

kafka.log.LogTest > testOpenDeletesObsoleteFiles PASSED

kafka.log.LogTest > testRebuildTimeIndexForOldMessages STARTED

kafka.log.LogTest > testRebuildTimeIndexForOldMessages PASSED

kafka.log.LogTest > testSizeBasedLogRoll STARTED

kafka.log.LogTest > testSizeBasedLogRoll PASSED

kafka.log.LogTest > shouldNotDeleteSizeBasedSegmentsWhenUnderRetentionSize 
STARTED

kafka.log.LogTest > shouldNotDeleteSizeBasedSegmentsWhenUnderRetentionSize 
PASSED

kafka.log.LogTest > testTimeBasedLogRollJitter STARTED

kafka.log.LogTest > testTimeBasedLogRollJitter PASSED

kafka.log.LogTest > testParseTopicPartitionName STARTED

kafka.log.LogTest > testParseTopicPartitionName PASSED

kafka.log.LogTest > testTruncateTo STARTED

kafka.log.LogTest > testTruncateTo PASSED

kafka.log.LogTest > testCleanShutdownFile STARTED

kafka.log.LogTest > testCleanShutdownFile PASSED

kafka.log.LogTest > testBuildTimeIndexWhenNotAssigningOffsets STARTED

kafka.log.LogTest > testBuildTimeIndexWhenNotAssigningOffsets PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[0] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[0] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[1] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[1] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[2] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[2] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[3] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[3] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[4] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[4] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[5] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[5] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[6] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[6] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[7] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[7] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[8] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[8] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[9] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[9] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[10] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[10] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[11] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[11] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[12] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[12] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[13] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[13] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[14] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression

[GitHub] kafka pull request #2812: Improve topic management instructions for Kafka St...

2017-04-05 Thread miguno
GitHub user miguno opened a pull request:

https://github.com/apache/kafka/pull/2812

Improve topic management instructions for Kafka Streams examples



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/miguno/kafka trunk-streams-examples-docs

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2812.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2812


commit a67977612f303431a338f4d2896449a6f5fed560
Author: Michael G. Noll 
Date:   2017-04-05T13:11:13Z

Improve topic management instructions for Kafka Streams examples




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-4814) ZookeeperLeaderElector not respecting zookeeper.set.acl

2017-04-05 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15956846#comment-15956846
 ] 

Ismael Juma commented on KAFKA-4814:


[~baluchicken], have you made progress on this one? If not, maybe [~rsivaram] 
could pick it up.

> ZookeeperLeaderElector not respecting zookeeper.set.acl
> ---
>
> Key: KAFKA-4814
> URL: https://issues.apache.org/jira/browse/KAFKA-4814
> Project: Kafka
>  Issue Type: Bug
>  Components: security
>Affects Versions: 0.10.1.1
>Reporter: Stevo Slavic
>Assignee: Balint Molnar
>  Labels: newbie
> Fix For: 0.11.0.0, 0.10.2.1
>
>
> By [migration 
> guide|https://kafka.apache.org/documentation/#zk_authz_migration] for 
> enabling ZooKeeper security on an existing Apache Kafka cluster, and [broker 
> configuration 
> documentation|https://kafka.apache.org/documentation/#brokerconfigs] for 
> {{zookeeper.set.acl}} configuration property, when this property is set to 
> false Kafka brokers should not be setting any ACLs on ZooKeeper nodes, even 
> when JAAS config file is provisioned to broker. 
> Problem is that there is broker side logic, like one in 
> {{ZookeeperLeaderElector}} making use of {{JaasUtils#isZkSecurityEnabled}}, 
> which does not respect this configuration property, resulting in ACLs being 
> set even when there's just JAAS config file provisioned to Kafka broker while 
> {{zookeeper.set.acl}} is set to {{false}}.
> Notice that {{JaasUtils}} is in {{org.apache.kafka.common.security}} package 
> of {{kafka-clients}} module, while {{zookeeper.set.acl}} is broker side only 
> configuration property.
> To make it possible without downtime to enable ZooKeeper authentication on 
> existing cluster, it should be possible to have all Kafka brokers in cluster 
> first authenticate to ZooKeeper cluster, without ACLs being set. Only once 
> all ZooKeeper clients (Kafka brokers and others) are authenticating to 
> ZooKeeper cluster then ACLs can be started being set.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KAFKA-5014) SSL Channel not ready but tcp is established and the server is hung will not sending metadata

2017-04-05 Thread Pengwei (JIRA)
Pengwei created KAFKA-5014:
--

 Summary: SSL Channel not ready but tcp is established and the 
server is hung will not sending metadata
 Key: KAFKA-5014
 URL: https://issues.apache.org/jira/browse/KAFKA-5014
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.10.2.0, 0.9.0.1
Reporter: Pengwei
Priority: Minor


In our test env, QA hang one of the connecting broker of the producer, then the 
producer will be stuck in send method, and throw the exception: fail to update 
metadata after request timeout.

I found the reason as follow:  when the producer chose one of the broker to 
send metadata, it connect to the broker, but the broker is hang, the tcp is 
connected and Network client marks this broker is connected, but the SSL 
channel is not ready yet so the channel is not ready.

Then the Network client chooses the connected node in the leastLoadedNode every 
time to send the metadata, but the node's channel is not ready yet.  

So the producer stuck in getting metadata and will not try another node to 
request metadata.  The client should not stuck only one node is hung



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KAFKA-5015) SASL/SCRAM authentication failures are hidden

2017-04-05 Thread JIRA
Johan Ström created KAFKA-5015:
--

 Summary: SASL/SCRAM authentication failures are hidden
 Key: KAFKA-5015
 URL: https://issues.apache.org/jira/browse/KAFKA-5015
 Project: Kafka
  Issue Type: Improvement
  Components: core
Affects Versions: 0.10.2.0
Reporter: Johan Ström


During experimentation with multiple brokers and SCRAM authentication, the 
brokers didn't seem to connect properly.
Apparently the receiving server does not log connection failures (and their 
cause) unless you enable DEBUG logging on 
org.apache.kafka.common.network.Selector.

Expected: that the rejected connections is logged (without stack trace) without 
having to enable DEBUG. 

(The root cause of my problem was that I hadn't yet added the user to the 
Zk-backed SCRAM configuration)

The controller flooded controller.log with WARNs:
{code}
[2017-04-05 15:33:42,850] WARN [Controller-1-to-broker-1-send-thread], 
Controller 1's connection to broker kafka02:9093 (id: 1 rack: null) was 
unsuccessful (kafka.controller.RequestSendThread)
java.io.IOException: Connection to kafka02:9093 (id: 1 rack: null) failed
{code}

The peer does not log anything in any log, until debugging was enabled:
{code}
[2017-04-05 15:28:58,373] DEBUG Accepted connection from /10.10.0.5:43670 on 
/10.10.0.6:9093 and assigned it to processor 4, sendBufferSize 
[actual|requested]: [102400|102400] recvBufferSize [actual|requested]: 
[102400|102400] (kafka.network.Acceptor)
[2017-04-05 15:28:58,374] DEBUG Processor 4 listening to new connection from 
/10.10.0.5:43670 (kafka.network.Processor)
[2017-04-05 15:28:58,376] DEBUG Set SASL server state to HANDSHAKE_REQUEST 
(org.apache.kafka.common.security.authenticator.SaslServerAuthenticator)
[2017-04-05 15:28:58,376] DEBUG Handle Kafka request SASL_HANDSHAKE 
(org.apache.kafka.common.security.authenticator.SaslServerAuthenticator)
[2017-04-05 15:28:58,378] DEBUG Using SASL mechanism 'SCRAM-SHA-512' provided 
by client 
(org.apache.kafka.common.security.authenticator.SaslServerAuthenticator)
[2017-04-05 15:28:58,381] DEBUG Setting SASL/SCRAM_SHA_512 server state to 
RECEIVE_CLIENT_FIRST_MESSAGE 
(org.apache.kafka.common.security.scram.ScramSaslServer)
[2017-04-05 15:28:58,381] DEBUG Set SASL server state to AUTHENTICATE 
(org.apache.kafka.common.security.authenticator.SaslServerAuthenticator)
[2017-04-05 15:28:58,383] DEBUG Setting SASL/SCRAM_SHA_512 server state to 
FAILED (org.apache.kafka.common.security.scram.ScramSaslServer)
[2017-04-05 15:28:58,383] DEBUG Set SASL server state to FAILED 
(org.apache.kafka.common.security.authenticator.SaslServerAuthenticator)
[2017-04-05 15:28:58,385] DEBUG Connection with /10.10.0.5 disconnected 
(org.apache.kafka.common.network.Selector)
java.io.IOException: javax.security.sasl.SaslException: Authentication failed: 
Credentials could not be obtained [Caused by javax.security.sasl.SaslException: 
Authentication failed: Invalid user credentials]
at 
org.apache.kafka.common.security.authenticator.SaslServerAuthenticator.authenticate(SaslServerAuthenticator.java:250)
at 
org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:71)
at 
org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:350)
at org.apache.kafka.common.network.Selector.poll(Selector.java:303)
at kafka.network.Processor.poll(SocketServer.scala:494)
at kafka.network.Processor.run(SocketServer.scala:432)
at java.lang.Thread.run(Thread.java:745)
Caused by: javax.security.sasl.SaslException: Authentication failed: 
Credentials could not be obtained [Caused by javax.security.sasl.SaslException: 
Authentication failed: Invalid user credentials]
at 
org.apache.kafka.common.security.scram.ScramSaslServer.evaluateResponse(ScramSaslServer.java:104)
at 
org.apache.kafka.common.security.authenticator.SaslServerAuthenticator.authenticate(SaslServerAuthenticator.java:235)
... 6 more
Caused by: javax.security.sasl.SaslException: Authentication failed: Invalid 
user credentials
at 
org.apache.kafka.common.security.scram.ScramSaslServer.evaluateResponse(ScramSaslServer.java:94)
... 7 more
{code}





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #2813: KAFKA-5014: NetworkClient.leastLoadedNode should c...

2017-04-05 Thread ijuma
GitHub user ijuma opened a pull request:

https://github.com/apache/kafka/pull/2813

KAFKA-5014: NetworkClient.leastLoadedNode should check if channel is ready



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ijuma/kafka 
kafka-5014-least-loaded-node-should-check-if-channel-is-ready

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2813.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2813


commit a418b0157eab6740967852ea601e1f94f30cf630
Author: Ismael Juma 
Date:   2017-04-05T14:13:16Z

KAFKA-5014: NetworkClient.leastLoadedNode should check if channel is ready




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-5014) SSL Channel not ready but tcp is established and the server is hung will not sending metadata

2017-04-05 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15956917#comment-15956917
 ] 

Ismael Juma commented on KAFKA-5014:


Thanks for the bug report. Is the fix simply 
https://github.com/apache/kafka/pull/2813/files ?

> SSL Channel not ready but tcp is established and the server is hung will not 
> sending metadata
> -
>
> Key: KAFKA-5014
> URL: https://issues.apache.org/jira/browse/KAFKA-5014
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1, 0.10.2.0
>Reporter: Pengwei
>Priority: Minor
> Fix For: 0.11.0.0
>
>
> In our test env, QA hang one of the connecting broker of the producer, then 
> the producer will be stuck in send method, and throw the exception: fail to 
> update metadata after request timeout.
> I found the reason as follow:  when the producer chose one of the broker to 
> send metadata, it connect to the broker, but the broker is hang, the tcp is 
> connected and Network client marks this broker is connected, but the SSL 
> channel is not ready yet so the channel is not ready.
> Then the Network client chooses the connected node in the leastLoadedNode 
> every time to send the metadata, but the node's channel is not ready yet.  
> So the producer stuck in getting metadata and will not try another node to 
> request metadata.  The client should not stuck only one node is hung



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-5014) SSL Channel not ready but tcp is established and the server is hung will not sending metadata

2017-04-05 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-5014:
---
Fix Version/s: 0.11.0.0

> SSL Channel not ready but tcp is established and the server is hung will not 
> sending metadata
> -
>
> Key: KAFKA-5014
> URL: https://issues.apache.org/jira/browse/KAFKA-5014
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1, 0.10.2.0
>Reporter: Pengwei
>Priority: Minor
> Fix For: 0.11.0.0
>
>
> In our test env, QA hang one of the connecting broker of the producer, then 
> the producer will be stuck in send method, and throw the exception: fail to 
> update metadata after request timeout.
> I found the reason as follow:  when the producer chose one of the broker to 
> send metadata, it connect to the broker, but the broker is hang, the tcp is 
> connected and Network client marks this broker is connected, but the SSL 
> channel is not ready yet so the channel is not ready.
> Then the Network client chooses the connected node in the leastLoadedNode 
> every time to send the metadata, but the node's channel is not ready yet.  
> So the producer stuck in getting metadata and will not try another node to 
> request metadata.  The client should not stuck only one node is hung



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5014) SSL Channel not ready but tcp is established and the server is hung will not sending metadata

2017-04-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15956915#comment-15956915
 ] 

ASF GitHub Bot commented on KAFKA-5014:
---

GitHub user ijuma opened a pull request:

https://github.com/apache/kafka/pull/2813

KAFKA-5014: NetworkClient.leastLoadedNode should check if channel is ready



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ijuma/kafka 
kafka-5014-least-loaded-node-should-check-if-channel-is-ready

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2813.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2813


commit a418b0157eab6740967852ea601e1f94f30cf630
Author: Ismael Juma 
Date:   2017-04-05T14:13:16Z

KAFKA-5014: NetworkClient.leastLoadedNode should check if channel is ready




> SSL Channel not ready but tcp is established and the server is hung will not 
> sending metadata
> -
>
> Key: KAFKA-5014
> URL: https://issues.apache.org/jira/browse/KAFKA-5014
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1, 0.10.2.0
>Reporter: Pengwei
>Priority: Minor
> Fix For: 0.11.0.0
>
>
> In our test env, QA hang one of the connecting broker of the producer, then 
> the producer will be stuck in send method, and throw the exception: fail to 
> update metadata after request timeout.
> I found the reason as follow:  when the producer chose one of the broker to 
> send metadata, it connect to the broker, but the broker is hang, the tcp is 
> connected and Network client marks this broker is connected, but the SSL 
> channel is not ready yet so the channel is not ready.
> Then the Network client chooses the connected node in the leastLoadedNode 
> every time to send the metadata, but the node's channel is not ready yet.  
> So the producer stuck in getting metadata and will not try another node to 
> request metadata.  The client should not stuck only one node is hung



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KAFKA-5016) Consumer hang in poll method while rebalancing is in progress

2017-04-05 Thread Domenico Di Giulio (JIRA)
Domenico Di Giulio created KAFKA-5016:
-

 Summary: Consumer hang in poll method while rebalancing is in 
progress
 Key: KAFKA-5016
 URL: https://issues.apache.org/jira/browse/KAFKA-5016
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.10.2.0, 0.10.1.0
Reporter: Domenico Di Giulio
 Attachments: Kafka 0.10.2.0 Issue (TRACE).txt

After moving to Kafka 0.10.2.0, it looks like I'm experiencing a hang in the 
rebalancing code. 

This is a test case, not (still) production code. It does the following with a 
single-partition topic and two consumers in the same group:

1) a topic with one partition is forced to be created (auto-created)
2) a producer is used to write 10 messages
3) the first consumer reads all the messages and commits
4) the second consumer attempts a poll() and hangs indefinitely

The same issue can't be found with 0.10.0.0.

See the attached logs at TRACE level. Look for "SERVER HANGS" to see where the 
hang is found: when this happens, the client keeps failing any hearbeat 
attempt, as the rebalancing is in progress, and the poll method hangs 
indefinitely.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5016) Consumer hang in poll method while rebalancing is in progress

2017-04-05 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15956928#comment-15956928
 ] 

Ismael Juma commented on KAFKA-5016:


Thanks for the bug report. [~vahid], would you be interested in investigating 
this one?

> Consumer hang in poll method while rebalancing is in progress
> -
>
> Key: KAFKA-5016
> URL: https://issues.apache.org/jira/browse/KAFKA-5016
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.1.0, 0.10.2.0
>Reporter: Domenico Di Giulio
> Attachments: Kafka 0.10.2.0 Issue (TRACE).txt
>
>
> After moving to Kafka 0.10.2.0, it looks like I'm experiencing a hang in the 
> rebalancing code. 
> This is a test case, not (still) production code. It does the following with 
> a single-partition topic and two consumers in the same group:
> 1) a topic with one partition is forced to be created (auto-created)
> 2) a producer is used to write 10 messages
> 3) the first consumer reads all the messages and commits
> 4) the second consumer attempts a poll() and hangs indefinitely
> The same issue can't be found with 0.10.0.0.
> See the attached logs at TRACE level. Look for "SERVER HANGS" to see where 
> the hang is found: when this happens, the client keeps failing any hearbeat 
> attempt, as the rebalancing is in progress, and the poll method hangs 
> indefinitely.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #2814: change language level from java 7 to 8

2017-04-05 Thread radai-rosenblatt
GitHub user radai-rosenblatt opened a pull request:

https://github.com/apache/kafka/pull/2814

change language level from java 7 to 8

now that KIP-118 has passed, and there are no major releases coming before 
0.11 

Signed-off-by: radai-rosenblatt 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/radai-rosenblatt/kafka java8-ftw

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2814.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2814


commit 0709f9613e9f8f729cf2cc5cf0927ce8ded70def
Author: radai-rosenblatt 
Date:   2017-04-05T14:17:05Z

change language level from java 7 to 8

Signed-off-by: radai-rosenblatt 




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-3455) Connect custom processors with the streams DSL

2017-04-05 Thread Nikki Thean (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15956941#comment-15956941
 ] 

Nikki Thean commented on KAFKA-3455:


Correct.

> Connect custom processors with the streams DSL
> --
>
> Key: KAFKA-3455
> URL: https://issues.apache.org/jira/browse/KAFKA-3455
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Affects Versions: 0.10.0.1
>Reporter: Jonathan Bender
>  Labels: user-experience
> Fix For: 0.11.0.0
>
>
> From the kafka users email thread, we discussed the idea of connecting custom 
> processors with topologies defined from the Streams DSL (and being able to 
> sink data from the processor).  Possibly this could involve exposing the 
> underlying processor's name in the streams DSL so it can be connected with 
> the standard processor API.
> {quote}
> Thanks for the feedback. This is definitely something we wanted to support
> in the Streams DSL.
> One tricky thing, though, is that some operations do not translate to a
> single processor, but a sub-graph of processors (think of a stream-stream
> join, which is translated to actually 5 processors for windowing / state
> queries / merging, each with a different internal name). So how to define
> the API to return the processor name needs some more thinking.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4943) SCRAM secret's should be better protected with Zookeeper ACLs

2017-04-05 Thread JIRA

[ 
https://issues.apache.org/jira/browse/KAFKA-4943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15956945#comment-15956945
 ] 

Johan Ström commented on KAFKA-4943:


Another possibly bad thing is that Kafka logs the credentials in the clear too 
(0.10.2.0):

{code}
[2017-04-05 16:29:00,266] INFO Processing notification(s) to /config/changes 
(kafka.common.ZkNodeChangeNotificationListener)
[2017-04-05 16:29:00,282] INFO Processing override for entityPath: users/kafka 
with config: 
{SCRAM-SHA-512=salt=ZGl6dnRzeWQ5ZjJhNWo1bWdxN2draG96Ng==,stored_key=BEdel+ChGSnpdpV0f8s8J/fWlwZJbUtAD1N6FygpPLK1AiVjg0yiHCvigq1R2x+o72QSvNkyFITuVZMlrj8hZg==,server_key=/RZ/EcGAaXwAKvFknVpsBHzC4tBXBLPJQnN4tM/s0wJpMcR9qvvJTGKM9Nx+zoXCc9buNoCd+/2LpL+yWde+/w==,iterations=4096}
 (kafka.server.DynamicConfigManager)
{code}

> SCRAM secret's should be better protected with Zookeeper ACLs
> -
>
> Key: KAFKA-4943
> URL: https://issues.apache.org/jira/browse/KAFKA-4943
> Project: Kafka
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 0.10.2.0
>Reporter: Johan Ström
>Assignee: Rajini Sivaram
> Fix For: 0.10.2.1
>
>
> With the new SCRAM authenticator the secrets are stored in Zookeeper:
> {code}
> get /kafka/config/users/alice
> {"version":1,"config":{"SCRAM-SHA-512":"salt=ODhnZjNkdWZibTV1cG1zdnV6bmh6djF3Mg==,stored_key=BAbHWHuGEb4m5+U+p0M9oFQmOPhU6M7q5jtZY8deDDoZCvxaqVNLz41yPzdgcp1WpiEBmfwYOuFlo9hMFKM7mA==,server_key=JW3KhpMeyUgh0OAC0kejuFUvUSlXBv/Z68tlfOWcMw5f5jrBwyBnjNQ9VZsSYz1AcI9IYaQ5S6H3yN39SieNiA==,iterations=4096"}}
> {code}
> These are stored without any ACL, and zookeeper-security-migration.sh does 
> not seem to change that either:
> {code}
> getAcl /kafka/config/users/alice
> 'world,'anyone
> : cdrwa
> getAcl /kafka/config/users
> 'world,'anyone
> : cdrwa
> getAcl /kafka
> 'world,'anyone
> : r
> 'sasl,'bob
> : cdrwa
> getAcl /kafka/config/changes
> 'world,'anyone
> : r
> 'sasl,'bob
> : cdrwa
> {code}
> The above output is after running security migrator, for some reason 
> /kafka/config/users is ignored, but others are fixed..
> Even if these where to be stored with secure ZkUtils#DefaultAcls, they would 
> be world readable.
> From my (limited) point of view, they should be readable by Kafka only.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [VOTE] KIP-81: Bound Fetch memory usage in the consumer

2017-04-05 Thread Mickael Maison
Thanks to everybody who voted and reviewed the KIP !

The vote passed with 5 binding +1s (Jason, Guozhang, Jun, Ismael,
Becket) and 3 non-binding +1s (Edoardo, Rajini, Radai)


On Mon, Apr 3, 2017 at 5:59 PM, Becket Qin  wrote:
> +1. Thanks for the KIP.
>
> On Mon, Apr 3, 2017 at 4:29 AM, Rajini Sivaram 
> wrote:
>
>> +1 (non-binding)
>>
>> On Fri, Mar 31, 2017 at 5:36 PM, radai  wrote:
>>
>> > possible priorities:
>> >
>> > 1. keepalives/coordination
>> > 2. inter-broker-traffic
>> > 3. produce traffic
>> > 4. consume traffic
>> >
>> > (dont want to start a debate, just to illustrate there may be >2 of them
>> so
>> > int is better than bool)
>> >
>> > On Fri, Mar 31, 2017 at 9:10 AM, Ismael Juma  wrote:
>> >
>> > > +1 from me too, thanks for the KIP.
>> > >
>> > > Ismael
>> > >
>> > > On Fri, Mar 31, 2017 at 5:06 PM, Jun Rao  wrote:
>> > >
>> > > > Hi, Mickael,
>> > > >
>> > > > Thanks for the KIP. +1 from me too.
>> > > >
>> > > > Jun
>> > > >
>> > > > On Thu, Mar 30, 2017 at 4:40 AM, Mickael Maison <
>> > > mickael.mai...@gmail.com>
>> > > > wrote:
>> > > >
>> > > > > Thanks for the suggestion.
>> > > > >
>> > > > > Currently, I can't think of a scenario when we would need multiple
>> > > > > priority "levels". If in the future it makes sense to have some, I
>> > > > > think we could just make the change without a new KIP as these APIs
>> > > > > are not public.
>> > > > > So I'd be more inclined to keep the boolean for now.
>> > > > >
>> > > > > On Wed, Mar 29, 2017 at 6:13 PM, Edoardo Comar 
>> > > > wrote:
>> > > > > > Hi Mickael,
>> > > > > > as discussed we could change the priority parameter to be an int
>> > > rather
>> > > > > > than a boolean.
>> > > > > >
>> > > > > > That's a bit more extensible
>> > > > > > --
>> > > > > > Edoardo Comar
>> > > > > > IBM MessageHub
>> > > > > > eco...@uk.ibm.com
>> > > > > > IBM UK Ltd, Hursley Park, SO21 2JN
>> > > > > >
>> > > > > > IBM United Kingdom Limited Registered in England and Wales with
>> > > number
>> > > > > > 741598 Registered office: PO Box 41, North Harbour, Portsmouth,
>> > > Hants.
>> > > > > PO6
>> > > > > > 3AU
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > From:   Guozhang Wang 
>> > > > > > To: "dev@kafka.apache.org" 
>> > > > > > Date:   28/03/2017 19:02
>> > > > > > Subject:Re: [VOTE] KIP-81: Bound Fetch memory usage in
>> the
>> > > > > > consumer
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > 1) Makes sense.
>> > > > > > 2) Makes sense. Thanks!
>> > > > > >
>> > > > > > On Tue, Mar 28, 2017 at 10:11 AM, Mickael Maison
>> > > > > > 
>> > > > > > wrote:
>> > > > > >
>> > > > > >> Hi Guozhang,
>> > > > > >>
>> > > > > >> Thanks for the feedback.
>> > > > > >>
>> > > > > >> 1) By MemoryPool, I mean the implementation added in KIP-72.
>> That
>> > > will
>> > > > > >> most likely be SimpleMemoryPool, but the PR for KIP-72 has not
>> > been
>> > > > > >> merged yet.
>> > > > > >> I've updated the KIP to make it more obvious.
>> > > > > >>
>> > > > > >> 2) I was thinking to pass in the priority when creating the
>> > > > > >> Coordinator Node (in
>> > > > > >> https://github.com/apache/kafka/blob/trunk/clients/src/
>> > > > > >> main/java/org/apache/kafka/clients/consumer/internals/
>> > > > > >> AbstractCoordinator.java#L582)
>> > > > > >> Then when calling Selector.connect() (in
>> > > > > >> https://github.com/apache/kafka/blob/trunk/clients/src/
>> > > > > >> main/java/org/apache/kafka/clients/NetworkClient.java#L643)
>> > > > > >> retrieve it and pass it in the Selector so it uses it when
>> > building
>> > > > > >> the Channel.
>> > > > > >> The idea was to avoid having to deduce the connection is for the
>> > > > > >> Coordinator from the ID but instead have it explicitly set by
>> > > > > >> AbstractCoordinator (and pass it all the way down to the
>> Channel)
>> > > > > >>
>> > > > > >> On Tue, Mar 28, 2017 at 1:33 AM, Guozhang Wang <
>> > wangg...@gmail.com>
>> > > > > > wrote:
>> > > > > >> > Mickael,
>> > > > > >> >
>> > > > > >> > Sorry for the late review of the KIP. I'm +1 on the proposed
>> > > change
>> > > > as
>> > > > > >> > well. Just a few minor comments on the wiki itself:
>> > > > > >> >
>> > > > > >> > 1. By the "MemoryPool" are you referring to a new class impl
>> or
>> > to
>> > > > > >> reusing "
>> > > > > >> > org.apache.kafka.clients.producer.internals.BufferPool"? I
>> > assume
>> > > > it
>> > > > > > was
>> > > > > >> > the latter case, and if yes, could you update the wiki page to
>> > > make
>> > > > it
>> > > > > >> > clear?
>> > > > > >> >
>> > > > > >> > 2. I think it is sufficient to add the priority to
>> KafkaChannel
>> > > > class,
>> > > > > >> but
>> > > > > >> > not needed in Node (but one may need to add this parameter to
>> > > > > > Selector#
>> > > > > >> > connect). Could you point me to which usage of Node needs to
>> > > access
>> > > > > > the
>> > > > > >> > priority?
>> > > > > >> >
>> > > > > >> >
>> > > > > >> > G

Re: [VOTE] KIP-120: Cleanup Kafka Streams builder API

2017-04-05 Thread Sriram Subramanian
+1

On Tue, Apr 4, 2017 at 9:01 PM, Ewen Cheslack-Postava 
wrote:

> +1 (binding)
>
> -Ewen
>
> On Thu, Mar 30, 2017 at 4:03 PM, Guozhang Wang  wrote:
>
> > +1.
> >
> >
> > On Thu, Mar 30, 2017 at 1:18 AM, Damian Guy 
> wrote:
> >
> > > Thanks Matthias.
> > >
> > > +1
> > >
> > > On Thu, 23 Mar 2017 at 22:40 Matthias J. Sax 
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I would like to start the VOTE on KIP-120:
> > > >
> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-120%
> > > 3A+Cleanup+Kafka+Streams+builder+API
> > > >
> > > > If you have further comments, please reply to the DISCUSS thread.
> > > >
> > > > Thanks a lot!
> > > >
> > > >
> > > > -Matthias
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>


Re: [VOTE] KIP-135 : Send of null key to a compacted topic should throw non-retriable error back to user

2017-04-05 Thread Mayuresh Gharat
Bumping up this thread.

On Mon, Apr 3, 2017 at 4:22 PM, Mayuresh Gharat 
wrote:

> Hi All,
>
> It seems that there is no further concern with the KIP-135. At this point
> we would like to start the voting process. The KIP can be found at
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 135+%3A+Send+of+null+key+to+a+compacted+topic+should+throw+
> non-retriable+error+back+to+user
> 
>
> Thanks,
>
> Mayuresh
>



-- 
-Regards,
Mayuresh R. Gharat
(862) 250-7125


[jira] [Resolved] (KAFKA-4801) Transient test failure (part 2): ConsumerBounceTest.testConsumptionWithBrokerFailures

2017-04-05 Thread Armin Braun (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Armin Braun resolved KAFKA-4801.

Resolution: Fixed

Can't reproduce this anymore and could easily do so before. Also haven't seen 
this on Jenkins in the last few weeks.

> Transient test failure (part 2): 
> ConsumerBounceTest.testConsumptionWithBrokerFailures
> -
>
> Key: KAFKA-4801
> URL: https://issues.apache.org/jira/browse/KAFKA-4801
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Armin Braun
>Priority: Minor
>  Labels: transient-system-test-failure
>
> There is still some (but very little ... when reproducing this you need more 
> than 100 runs in half the cases statistically) instability left in the test
> {code}
> ConsumerBounceTest.testConsumptionWithBrokerFailures
> {code}
> Resulting in this exception being thrown at a relatively low rate (I'd say 
> def less than 0.5% of all runs on my machine).
> {code}
> kafka.api.ConsumerBounceTest > testConsumptionWithBrokerFailures FAILED
> java.lang.IllegalArgumentException: You can only check the position for 
> partitions assigned to this consumer.
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.position(KafkaConsumer.java:1271)
> at 
> kafka.api.ConsumerBounceTest.consumeWithBrokerFailures(ConsumerBounceTest.scala:96)
> at 
> kafka.api.ConsumerBounceTest.testConsumptionWithBrokerFailures(ConsumerBounceTest.scala:69)
> {code}
> this was also reported in a comment to the original KAFKA-4198
> https://issues.apache.org/jira/browse/KAFKA-4198?focusedCommentId=15765468&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15765468



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5007) Kafka Replica Fetcher Thread- Resource Leak

2017-04-05 Thread Joseph Aliase (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15957413#comment-15957413
 ] 

Joseph Aliase commented on KAFKA-5007:
--

[~gwenshap] Please take a look at this its a major bug

> Kafka Replica Fetcher Thread- Resource Leak
> ---
>
> Key: KAFKA-5007
> URL: https://issues.apache.org/jira/browse/KAFKA-5007
> Project: Kafka
>  Issue Type: Bug
>  Components: core, network
>Affects Versions: 0.10.1.1
> Environment: Centos 7
> Jave 8
>Reporter: Joseph Aliase
>
> Kafka is running out of open file descriptor when system network interface is 
> done.
> Issue description:
> We have a Kafka Cluster of 5 node running on version 0.10.1.1. The open file 
> descriptor for the account running Kafka is set to 10.
> During an upgrade, network interface went down. Outage continued for 12 hours 
> eventually all the broker crashed with java.io.IOException: Too many open 
> files error.
> We repeated the test in a lower environment and observed that Open Socket 
> count keeps on increasing while the NIC is down.
> We have around 13 topics with max partition size of 120 and number of replica 
> fetcher thread is set to 8.
> Using an internal monitoring tool we observed that Open Socket descriptor   
> for the broker pid continued to increase although NIC was down leading to  
> Open File descriptor error. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #2776: HOTFIX: WindowedStreamPartitioner does not provide...

2017-04-05 Thread mjsax
Github user mjsax closed the pull request at:

https://github.com/apache/kafka/pull/2776


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (KAFKA-3455) Connect custom processors with the streams DSL

2017-04-05 Thread Matthias J. Sax (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-3455.

Resolution: Not A Problem

> Connect custom processors with the streams DSL
> --
>
> Key: KAFKA-3455
> URL: https://issues.apache.org/jira/browse/KAFKA-3455
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Affects Versions: 0.10.0.1
>Reporter: Jonathan Bender
>  Labels: user-experience
> Fix For: 0.11.0.0
>
>
> From the kafka users email thread, we discussed the idea of connecting custom 
> processors with topologies defined from the Streams DSL (and being able to 
> sink data from the processor).  Possibly this could involve exposing the 
> underlying processor's name in the streams DSL so it can be connected with 
> the standard processor API.
> {quote}
> Thanks for the feedback. This is definitely something we wanted to support
> in the Streams DSL.
> One tricky thing, though, is that some operations do not translate to a
> single processor, but a sub-graph of processors (think of a stream-stream
> join, which is translated to actually 5 processors for windowing / state
> queries / merging, each with a different internal name). So how to define
> the API to return the processor name needs some more thinking.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [VOTE] KIP-122: Add Reset Consumer Group Offsets tooling

2017-04-05 Thread Jorge Esteban Quilcate Otoya
Thanks to everyone who voted and/or provided feedback!

The vote passed with 4 binding +1s (Gwen, Grant, Jason, Becket) and 5
non-binding +1s (Matthias, Vahid, Bill, Dong, Mickael).

El mié., 5 abr. 2017 a las 11:58, Jorge Esteban Quilcate Otoya (<
quilcate.jo...@gmail.com>) escribió:

> Thanks Ismael! Response inline.
>
> El mar., 4 abr. 2017 a las 17:43, Ismael Juma ()
> escribió:
>
> Sorry for the delay Jorge. Responses inline.
>
> On Thu, Mar 23, 2017 at 5:56 PM, Jorge Esteban Quilcate Otoya <
> quilcate.jo...@gmail.com> wrote:
>
> > @Ismael, thanks for your feedback!
> > 1. Good point. I will add optional support for timezone as part of the
> > datetime input. But, when datetime is without timezone, would it be more
> > consistent to get the timezone from the cluster first and then reset
> based
> > on that value? Not sure if it is possible to get that info from the
> > cluster. But, in case that's not available, I could add a note to advise
> > that in case timezone is not specified the tool will get that value from
> > the client and it would be user's responsibility to validate that is
> > aligned with the server.
> >
>
> There's no way to get such data from the Cluster today. It's relatively
> common for servers to use UTC as their timezone though. Is there any value
> in using the client timezone? Today's apps typically have data from all
> over and what are the chances that the time zone from where the client is
> running is the correct one?
>
>
> You're right, make sense to use UTC by default and accept Timezone as part
> of the input value. I updated the KIP.
>
>
>
> > 2. Happy to add it to the KIP.
> >
>
> OK.
>
>
> > 3. This was part of the discussion thread, we end up with `shift-by` to
> > avoid adding `offset` to each case and make it a bit more consistent.
> >
>
> OK.
>
>
> > 4. I could remove CURRENT-OFFSET and CURRENT-LAG from the output, and
> leave
> > it as part of `describe` operation, if that's better.
> >
>
> It seems better to me as one would hope people would look at describe to
> find the current values.
>
> Done, KIP updated.
>
>
> > 5. Agree. At the beginning we consider `shift-plus` and `shift-minus`,
> but
> > agree to join them in one option and accept +/- as input. Maybe that's a
> > better option?
> >
>
> Not sure, maybe it's fine as it is. I can't think of anything better, at
> least.
>
>
> Agreed, I also think is good enough as it is now.
>
>
>
> Ismael
>
>
> Jorge.
>
>


[VOTE] KIP 130: Expose states of active tasks to KafkaStreams public API

2017-04-05 Thread Florian Hussonnois
Hi All,

I would like to start the vote for the KIP-130 :

https://cwiki.apache.org/confluence/display/KAFKA/KIP+130%3A+Expose+states+of+active+tasks+to+KafkaStreams+public+API

Thanks,

-- 
Florian HUSSONNOIS


Re: [DISCUSS] KIP-131 : Add access to OffsetStorageReader from SourceConnector

2017-04-05 Thread Florian Hussonnois
Hi All,

Is there any feedback regarding that KIP ?
https://cwiki.apache.org/confluence/display/KAFKA/KIP-131+-+Add+access+to+OffsetStorageReader+from+SourceConnector

Thanks,

2017-03-14 22:51 GMT+01:00 Florian Hussonnois :

> Hi Matthias,
>
> Sorry I didn't know this page. Ths KIP has been added to it.
>
> Thanks,
>
> 2017-03-13 21:30 GMT+01:00 Matthias J. Sax :
>
>> Can you please add the KIP to this table:
>>
>> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+
>> Improvement+Proposals#KafkaImprovementProposals-KIPsunderdiscussion
>>
>> Thanks,
>>
>>  Matthias
>>
>>
>> On 3/7/17 1:24 PM, Florian Hussonnois wrote:
>> > Hi all,
>> >
>> > I've created a new KIP to add access to OffsetStorageReader from
>> > SourceConnector
>> >
>> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-131+-+
>> Add+access+to+OffsetStorageReader+from+SourceConnector
>> >
>> > Thanks.
>> >
>>
>>
>
>
> --
> Florian HUSSONNOIS
>



-- 
Florian HUSSONNOIS


[GitHub] kafka pull request #2779: KAFKA-4993: Fix findbugs warnings in kafka-clients

2017-04-05 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2779


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-4993) Fix findbugs warnings in kafka-clients

2017-04-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15957709#comment-15957709
 ] 

ASF GitHub Bot commented on KAFKA-4993:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2779


> Fix findbugs warnings in kafka-clients
> --
>
> Key: KAFKA-4993
> URL: https://issues.apache.org/jira/browse/KAFKA-4993
> Project: Kafka
>  Issue Type: Bug
>Reporter: Colin P. McCabe
>Assignee: Colin P. McCabe
>
> Fix findbugs warnings in kafka-clients



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (KAFKA-4993) Fix findbugs warnings in kafka-clients

2017-04-05 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma resolved KAFKA-4993.

   Resolution: Fixed
Fix Version/s: 0.11.0.0

> Fix findbugs warnings in kafka-clients
> --
>
> Key: KAFKA-4993
> URL: https://issues.apache.org/jira/browse/KAFKA-4993
> Project: Kafka
>  Issue Type: Bug
>Reporter: Colin P. McCabe
>Assignee: Colin P. McCabe
> Fix For: 0.11.0.0
>
>
> Fix findbugs warnings in kafka-clients



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-4819) Expose states of active tasks to public API

2017-04-05 Thread Matthias J. Sax (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-4819:
---
Labels: kip  (was: needs-kip)

> Expose states of active tasks to public API
> ---
>
> Key: KAFKA-4819
> URL: https://issues.apache.org/jira/browse/KAFKA-4819
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Affects Versions: 0.10.2.0
>Reporter: Florian Hussonnois
>Assignee: Florian Hussonnois
>Priority: Minor
>  Labels: kip
>
> In Kafka 0.10.1.0 the toString method of KafkaStreams class has been 
> implemented mainly to ease topologies debugging. Also,  the streams Metrics 
> has been exposed to public API.
> But currently theres is no way to monitor kstreams tasks states, assignments 
> or consumed offsets.
> I propose to expose the states of active tasks to the public API KafkaStreams.
> For instance, an application can expose a REST API to get the global state of 
> a kstreams topology.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-4819) Expose states of active tasks to public API

2017-04-05 Thread Matthias J. Sax (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-4819:
---
Description: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP+130%3A+Expose+states+of+active+tasks+to+KafkaStreams+public+API
  (was: In Kafka 0.10.1.0 the toString method of KafkaStreams class has been 
implemented mainly to ease topologies debugging. Also,  the streams Metrics has 
been exposed to public API.

But currently theres is no way to monitor kstreams tasks states, assignments or 
consumed offsets.

I propose to expose the states of active tasks to the public API KafkaStreams.

For instance, an application can expose a REST API to get the global state of a 
kstreams topology.)

> Expose states of active tasks to public API
> ---
>
> Key: KAFKA-4819
> URL: https://issues.apache.org/jira/browse/KAFKA-4819
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Affects Versions: 0.10.2.0
>Reporter: Florian Hussonnois
>Assignee: Florian Hussonnois
>Priority: Minor
>  Labels: kip
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP+130%3A+Expose+states+of+active+tasks+to+KafkaStreams+public+API



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #2786: MINOR: Removed dead field

2017-04-05 Thread original-brownbear
Github user original-brownbear closed the pull request at:

https://github.com/apache/kafka/pull/2786


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-1211) Hold the produce request with ack > 1 in purgatory until replicas' HW has larger than the produce offset (KIP-101)

2017-04-05 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15957938#comment-15957938
 ] 

Jun Rao commented on KAFKA-1211:


The PR for this issue is https://github.com/apache/kafka/pull/2808

> Hold the produce request with ack > 1 in purgatory until replicas' HW has 
> larger than the produce offset (KIP-101)
> --
>
> Key: KAFKA-1211
> URL: https://issues.apache.org/jira/browse/KAFKA-1211
> Project: Kafka
>  Issue Type: Bug
>Reporter: Guozhang Wang
>Assignee: Ben Stopford
>  Labels: reliability
> Fix For: 0.11.0.0
>
>
> Today during leader failover we will have a weakness period when the 
> followers truncate their data before fetching from the new leader, i.e., 
> number of in-sync replicas is just 1. If during this time the leader has also 
> failed then produce requests with ack >1 that have get responded will still 
> be lost. To avoid this scenario we would prefer to hold the produce request 
> in purgatory until replica's HW has larger than the offset instead of just 
> their end-of-log offsets.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Build failed in Jenkins: kafka-trunk-jdk8 #1406

2017-04-05 Thread Apache Jenkins Server
See 


Changes:

[ismael] KAFKA-4993; Fix findbugs warnings in kafka-clients

--
[...truncated 340.48 KB...]

kafka.server.KafkaConfigTest > testLogRollTimeMsProvided STARTED

kafka.server.KafkaConfigTest > testLogRollTimeMsProvided PASSED

kafka.server.KafkaConfigTest > testUncleanLeaderElectionDefault STARTED

kafka.server.KafkaConfigTest > testUncleanLeaderElectionDefault PASSED

kafka.server.KafkaConfigTest > testInvalidAdvertisedListenersProtocol STARTED

kafka.server.KafkaConfigTest > testInvalidAdvertisedListenersProtocol PASSED

kafka.server.KafkaConfigTest > testUncleanElectionEnabled STARTED

kafka.server.KafkaConfigTest > testUncleanElectionEnabled PASSED

kafka.server.KafkaConfigTest > testAdvertisePortDefault STARTED

kafka.server.KafkaConfigTest > testAdvertisePortDefault PASSED

kafka.server.KafkaConfigTest > testVersionConfiguration STARTED

kafka.server.KafkaConfigTest > testVersionConfiguration PASSED

kafka.server.KafkaConfigTest > testEqualAdvertisedListenersProtocol STARTED

kafka.server.KafkaConfigTest > testEqualAdvertisedListenersProtocol PASSED

kafka.server.SslReplicaFetchTest > testReplicaFetcherThread STARTED

kafka.server.SslReplicaFetchTest > testReplicaFetcherThread PASSED

kafka.server.IsrExpirationTest > testIsrExpirationForSlowFollowers STARTED

kafka.server.IsrExpirationTest > testIsrExpirationForSlowFollowers PASSED

kafka.server.IsrExpirationTest > testIsrExpirationForStuckFollowers STARTED

kafka.server.IsrExpirationTest > testIsrExpirationForStuckFollowers PASSED

kafka.server.IsrExpirationTest > testIsrExpirationIfNoFetchRequestMade STARTED

kafka.server.IsrExpirationTest > testIsrExpirationIfNoFetchRequestMade PASSED

kafka.server.SaslPlaintextReplicaFetchTest > testReplicaFetcherThread STARTED

kafka.server.SaslPlaintextReplicaFetchTest > testReplicaFetcherThread PASSED

kafka.server.ReplicationQuotasTest > 
shouldBootstrapTwoBrokersWithLeaderThrottle STARTED

kafka.server.ReplicationQuotasTest > 
shouldBootstrapTwoBrokersWithLeaderThrottle PASSED

kafka.server.ReplicationQuotasTest > shouldThrottleOldSegments STARTED

kafka.server.ReplicationQuotasTest > shouldThrottleOldSegments PASSED

kafka.server.ReplicationQuotasTest > 
shouldBootstrapTwoBrokersWithFollowerThrottle STARTED

kafka.server.ReplicationQuotasTest > 
shouldBootstrapTwoBrokersWithFollowerThrottle PASSED

kafka.server.ServerStartupTest > testBrokerStateRunningAfterZK STARTED

kafka.server.ServerStartupTest > testBrokerStateRunningAfterZK PASSED

kafka.server.ServerStartupTest > testBrokerCreatesZKChroot STARTED

kafka.server.ServerStartupTest > testBrokerCreatesZKChroot PASSED

kafka.server.ServerStartupTest > testConflictBrokerStartupWithSamePort STARTED

kafka.server.ServerStartupTest > testConflictBrokerStartupWithSamePort PASSED

kafka.server.ServerStartupTest > testConflictBrokerRegistration STARTED

kafka.server.ServerStartupTest > testConflictBrokerRegistration PASSED

kafka.server.ServerStartupTest > testBrokerSelfAware STARTED

kafka.server.ServerStartupTest > testBrokerSelfAware PASSED

kafka.server.ProduceRequestTest > testSimpleProduceRequest STARTED

kafka.server.ProduceRequestTest > testSimpleProduceRequest PASSED

kafka.server.ProduceRequestTest > testCorruptLz4ProduceRequest STARTED

kafka.server.ProduceRequestTest > testCorruptLz4ProduceRequest PASSED

kafka.server.ReplicaManagerTest > testHighWaterMarkDirectoryMapping STARTED

kafka.server.ReplicaManagerTest > testHighWaterMarkDirectoryMapping PASSED

kafka.server.ReplicaManagerTest > 
testFetchBeyondHighWatermarkReturnEmptyResponse STARTED

kafka.server.ReplicaManagerTest > 
testFetchBeyondHighWatermarkReturnEmptyResponse PASSED

kafka.server.ReplicaManagerTest > testIllegalRequiredAcks STARTED

kafka.server.ReplicaManagerTest > testIllegalRequiredAcks PASSED

kafka.server.ReplicaManagerTest > testClearPurgatoryOnBecomingFollower STARTED

kafka.server.ReplicaManagerTest > testClearPurgatoryOnBecomingFollower PASSED

kafka.server.ReplicaManagerTest > testHighwaterMarkRelativeDirectoryMapping 
STARTED

kafka.server.ReplicaManagerTest > testHighwaterMarkRelativeDirectoryMapping 
PASSED

kafka.server.KafkaMetricReporterClusterIdTest > testClusterIdPresent STARTED

kafka.server.KafkaMetricReporterClusterIdTest > testClusterIdPresent PASSED

kafka.server.CreateTopicsRequestWithPolicyTest > testValidCreateTopicsRequests 
STARTED

kafka.server.CreateTopicsRequestWithPolicyTest > testValidCreateTopicsRequests 
PASSED

kafka.server.CreateTopicsRequestWithPolicyTest > testErrorCreateTopicsRequests 
STARTED

kafka.server.CreateTopicsRequestWithPolicyTest > testErrorCreateTopicsRequests 
PASSED

kafka.server.OffsetCommitTest > testUpdateOffsets STARTED

kafka.server.OffsetCommitTest > testUpdateOffsets PASSED

kafka.server.OffsetCommitTest > testLargeMetadataPayload STARTED

kafka.server.OffsetCommitTest > testLargeMetadata

Jenkins build is back to normal : kafka-trunk-jdk7 #2075

2017-04-05 Thread Apache Jenkins Server
See 




[jira] [Created] (KAFKA-5017) Consider making baseOffset the first offset in message format v2

2017-04-05 Thread Ismael Juma (JIRA)
Ismael Juma created KAFKA-5017:
--

 Summary: Consider making baseOffset the first offset in message 
format v2
 Key: KAFKA-5017
 URL: https://issues.apache.org/jira/browse/KAFKA-5017
 Project: Kafka
  Issue Type: Sub-task
Reporter: Ismael Juma
 Fix For: 0.11.0.0


Currently baseOffset starts as the first offset of the batch. If the first 
record is removed by compaction, baseOffset doesn't change and it is no longer 
the same as the first offset of the batch. This is inconsistent with every 
other field in the record batch header and it seems like there is no longer a 
reason for this behaviour.

We should consider simplifying the behaviour so that baseOffset is simply the 
first offset of the record batch. We need to do this before 0.11 or we probably 
won't be able to do it until the next message format version change.

cc [~hachikuji]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4935) Consider disabling record level CRC checks for message format V2

2017-04-05 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15958033#comment-15958033
 ] 

Ismael Juma commented on KAFKA-4935:


Whatever we decide, we should remove the following code comment:

{code}
// TODO: The crc is useless for the new message format. Maybe we should let 
writeTo return the written size?
{code}

> Consider disabling record level CRC checks for message format V2
> 
>
> Key: KAFKA-4935
> URL: https://issues.apache.org/jira/browse/KAFKA-4935
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Apurva Mehta
> Fix For: 0.11.0.0
>
>
> With the new message format proposed in KIP-98, the record level CRC has been 
> moved to the the batch header.
> Because we expose the record-level CRC in `RecordMetadata` and 
> `ConsumerRecord`, we currently compute it eagerly based on the key, value and 
> timestamp even though these methods are rarely used. Ideally, we'd deprecate 
> the relevant methods in `RecordMetadata` and `ConsumerRecord` while making 
> the CRC computation lazy. This seems pretty hard to achieve in the Producer 
> without increasing memory retention, but it may be possible to do in the 
> Consumer.
> An alternative option is to return the batch CRC from the relevant methods.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KAFKA-5018) LogCleaner tests to verify behaviour of message format v2

2017-04-05 Thread Ismael Juma (JIRA)
Ismael Juma created KAFKA-5018:
--

 Summary: LogCleaner tests to verify behaviour of message format v2
 Key: KAFKA-5018
 URL: https://issues.apache.org/jira/browse/KAFKA-5018
 Project: Kafka
  Issue Type: Sub-task
Reporter: Ismael Juma
 Fix For: 0.11.0.0


It would be good to add LogCleaner tests to verify the behaviour of fields like 
baseOffset after compaction.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5007) Kafka Replica Fetcher Thread- Resource Leak

2017-04-05 Thread Joseph Aliase (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15958049#comment-15958049
 ] 

Joseph Aliase commented on KAFKA-5007:
--

[~junrao]Any guidance will be helpfull. 

> Kafka Replica Fetcher Thread- Resource Leak
> ---
>
> Key: KAFKA-5007
> URL: https://issues.apache.org/jira/browse/KAFKA-5007
> Project: Kafka
>  Issue Type: Bug
>  Components: core, network
>Affects Versions: 0.10.1.1
> Environment: Centos 7
> Jave 8
>Reporter: Joseph Aliase
>
> Kafka is running out of open file descriptor when system network interface is 
> done.
> Issue description:
> We have a Kafka Cluster of 5 node running on version 0.10.1.1. The open file 
> descriptor for the account running Kafka is set to 10.
> During an upgrade, network interface went down. Outage continued for 12 hours 
> eventually all the broker crashed with java.io.IOException: Too many open 
> files error.
> We repeated the test in a lower environment and observed that Open Socket 
> count keeps on increasing while the NIC is down.
> We have around 13 topics with max partition size of 120 and number of replica 
> fetcher thread is set to 8.
> Using an internal monitoring tool we observed that Open Socket descriptor   
> for the broker pid continued to increase although NIC was down leading to  
> Open File descriptor error. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #2815: HOTFIX: break infinite loop if Streams input topic...

2017-04-05 Thread mjsax
GitHub user mjsax opened a pull request:

https://github.com/apache/kafka/pull/2815

HOTFIX: break infinite loop if Streams input topics are not available



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mjsax/kafka hotfix-infinite-loop

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2815.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2815


commit 0fece10db85c858c36a56ff7a463636cdf335d70
Author: Matthias J. Sax 
Date:   2017-04-06T00:01:35Z

HOTFIX: break infinite loop of Stream input topics are not available




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-5010) Log cleaner crashed with BufferOverflowException when writing to the writeBuffer

2017-04-05 Thread Shuai Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15958091#comment-15958091
 ] 

Shuai Lin commented on KAFKA-5010:
--

Hi [~ijuma], I'm not so familiar with kafka internals enough that I'm afraid I 
can't find a fix myself. Do you think someone in the community can work on this?

> Log cleaner crashed with BufferOverflowException when writing to the 
> writeBuffer
> 
>
> Key: KAFKA-5010
> URL: https://issues.apache.org/jira/browse/KAFKA-5010
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.2.0
>Reporter: Shuai Lin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.11.0.0
>
>
> After upgrading from 0.10.0.1 to 0.10.2.0 the log cleaner thread crashed with 
> BufferOverflowException when writing the filtered records into the 
> writeBuffer:
> {code}
> [2017-03-24 10:41:03,926] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2017-03-24 10:41:04,177] INFO Cleaner 0: Beginning cleaning of log 
> app-topic-20170317-20. (kafka.log.LogCleaner)
> [2017-03-24 10:41:04,177] INFO Cleaner 0: Building offset map for 
> app-topic-20170317-20... (kafka.log.LogCleaner)
> [2017-03-24 10:41:04,387] INFO Cleaner 0: Building offset map for log 
> app-topic-20170317-20 for 1 segments in offset range [9737795, 9887707). 
> (kafka.log.LogCleaner)
> [2017-03-24 10:41:07,101] INFO Cleaner 0: Offset map for log 
> app-topic-20170317-20 complete. (kafka.log.LogCleaner)
> [2017-03-24 10:41:07,106] INFO Cleaner 0: Cleaning log app-topic-20170317-20 
> (cleaning prior to Fri Mar 24 10:36:06 GMT 2017, discarding tombstones prior 
> to Thu Mar 23 10:18:02 GMT 2017)... (kafka.log.LogCleaner)
> [2017-03-24 10:41:07,110] INFO Cleaner 0: Cleaning segment 0 in log 
> app-topic-20170317-20 (largest timestamp Fri Mar 24 09:58:25 GMT 2017) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2017-03-24 10:41:07,372] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> java.nio.BufferOverflowException
> at java.nio.HeapByteBuffer.put(HeapByteBuffer.java:206)
> at org.apache.kafka.common.record.LogEntry.writeTo(LogEntry.java:98)
> at 
> org.apache.kafka.common.record.MemoryRecords.filterTo(MemoryRecords.java:158)
> at 
> org.apache.kafka.common.record.MemoryRecords.filterTo(MemoryRecords.java:111)
> at kafka.log.Cleaner.cleanInto(LogCleaner.scala:468)
> at kafka.log.Cleaner.$anonfun$cleanSegments$1(LogCleaner.scala:405)
> at 
> kafka.log.Cleaner.$anonfun$cleanSegments$1$adapted(LogCleaner.scala:401)
> at scala.collection.immutable.List.foreach(List.scala:378)
> at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:401)
> at kafka.log.Cleaner.$anonfun$clean$6(LogCleaner.scala:363)
> at kafka.log.Cleaner.$anonfun$clean$6$adapted(LogCleaner.scala:362)
> at scala.collection.immutable.List.foreach(List.scala:378)
> at kafka.log.Cleaner.clean(LogCleaner.scala:362)
> at 
> kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:241)
> at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:220)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
> [2017-03-24 10:41:07,375] INFO [kafka-log-cleaner-thread-0], Stopped  
> (kafka.log.LogCleaner)
> {code}
> I tried different values of log.cleaner.buffer.size, from 512K to 2M to 10M 
> to 128M, all with no luck: The log cleaner thread crashed immediately after 
> the broker got restarted. But setting it to 256MB fixed the problem!
> Here are the settings for the cluster:
> {code}
> - log.message.format.version = 0.9.0.0 (we use 0.9 format because have old 
> consumers)
> - log.cleaner.enable = 'true'
> - log.cleaner.min.cleanable.ratio = '0.1'
> - log.cleaner.threads = '1'
> - log.cleaner.io.buffer.load.factor = '0.98'
> - log.roll.hours = '24'
> - log.cleaner.dedupe.buffer.size = 2GB 
> - log.segment.bytes = 256MB (global is 512MB, but we have been using 256MB 
> for this topic)
> - message.max.bytes = 10MB
> {code}
> Given that the size of readBuffer and writeBuffer are exactly the same (half 
> of log.cleaner.io.buffer.size), why would the cleaner throw a 
> BufferOverflowException when writing the filtered records into the 
> writeBuffer? IIUC that should never happen because the size of the filtered 
> records should be no greater that the size of the readBuffer, thus no greater 
> than the size of the writeBuffer.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5010) Log cleaner crashed with BufferOverflowException when writing to the writeBuffer

2017-04-05 Thread Shuai Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15958095#comment-15958095
 ] 

Shuai Lin commented on KAFKA-5010:
--

For now I can think of a quick fix that may help: to always keep the capacity 
of the write buffer twice as much as the read buffer, as what i did in [this 
commit|https://github.com/scrapinghub/kafka/commit/66b0315681b1cbefae941ba68face7fc7f7baa78].
 It's not fixing the problem from the root, but i think it can temporarily fix 
the write buffer overflow exception.

> Log cleaner crashed with BufferOverflowException when writing to the 
> writeBuffer
> 
>
> Key: KAFKA-5010
> URL: https://issues.apache.org/jira/browse/KAFKA-5010
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.2.0
>Reporter: Shuai Lin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.11.0.0
>
>
> After upgrading from 0.10.0.1 to 0.10.2.0 the log cleaner thread crashed with 
> BufferOverflowException when writing the filtered records into the 
> writeBuffer:
> {code}
> [2017-03-24 10:41:03,926] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2017-03-24 10:41:04,177] INFO Cleaner 0: Beginning cleaning of log 
> app-topic-20170317-20. (kafka.log.LogCleaner)
> [2017-03-24 10:41:04,177] INFO Cleaner 0: Building offset map for 
> app-topic-20170317-20... (kafka.log.LogCleaner)
> [2017-03-24 10:41:04,387] INFO Cleaner 0: Building offset map for log 
> app-topic-20170317-20 for 1 segments in offset range [9737795, 9887707). 
> (kafka.log.LogCleaner)
> [2017-03-24 10:41:07,101] INFO Cleaner 0: Offset map for log 
> app-topic-20170317-20 complete. (kafka.log.LogCleaner)
> [2017-03-24 10:41:07,106] INFO Cleaner 0: Cleaning log app-topic-20170317-20 
> (cleaning prior to Fri Mar 24 10:36:06 GMT 2017, discarding tombstones prior 
> to Thu Mar 23 10:18:02 GMT 2017)... (kafka.log.LogCleaner)
> [2017-03-24 10:41:07,110] INFO Cleaner 0: Cleaning segment 0 in log 
> app-topic-20170317-20 (largest timestamp Fri Mar 24 09:58:25 GMT 2017) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2017-03-24 10:41:07,372] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> java.nio.BufferOverflowException
> at java.nio.HeapByteBuffer.put(HeapByteBuffer.java:206)
> at org.apache.kafka.common.record.LogEntry.writeTo(LogEntry.java:98)
> at 
> org.apache.kafka.common.record.MemoryRecords.filterTo(MemoryRecords.java:158)
> at 
> org.apache.kafka.common.record.MemoryRecords.filterTo(MemoryRecords.java:111)
> at kafka.log.Cleaner.cleanInto(LogCleaner.scala:468)
> at kafka.log.Cleaner.$anonfun$cleanSegments$1(LogCleaner.scala:405)
> at 
> kafka.log.Cleaner.$anonfun$cleanSegments$1$adapted(LogCleaner.scala:401)
> at scala.collection.immutable.List.foreach(List.scala:378)
> at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:401)
> at kafka.log.Cleaner.$anonfun$clean$6(LogCleaner.scala:363)
> at kafka.log.Cleaner.$anonfun$clean$6$adapted(LogCleaner.scala:362)
> at scala.collection.immutable.List.foreach(List.scala:378)
> at kafka.log.Cleaner.clean(LogCleaner.scala:362)
> at 
> kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:241)
> at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:220)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
> [2017-03-24 10:41:07,375] INFO [kafka-log-cleaner-thread-0], Stopped  
> (kafka.log.LogCleaner)
> {code}
> I tried different values of log.cleaner.buffer.size, from 512K to 2M to 10M 
> to 128M, all with no luck: The log cleaner thread crashed immediately after 
> the broker got restarted. But setting it to 256MB fixed the problem!
> Here are the settings for the cluster:
> {code}
> - log.message.format.version = 0.9.0.0 (we use 0.9 format because have old 
> consumers)
> - log.cleaner.enable = 'true'
> - log.cleaner.min.cleanable.ratio = '0.1'
> - log.cleaner.threads = '1'
> - log.cleaner.io.buffer.load.factor = '0.98'
> - log.roll.hours = '24'
> - log.cleaner.dedupe.buffer.size = 2GB 
> - log.segment.bytes = 256MB (global is 512MB, but we have been using 256MB 
> for this topic)
> - message.max.bytes = 10MB
> {code}
> Given that the size of readBuffer and writeBuffer are exactly the same (half 
> of log.cleaner.io.buffer.size), why would the cleaner throw a 
> BufferOverflowException when writing the filtered records into the 
> writeBuffer? IIUC that should never happen because the size of the filtered 
> records should be no greater that the size of the readBuffer, thus no greater 
> than the size of the writeBuffer.



--
This message was sent by Atl

Re: [VOTE] KIP-112 - Handle disk failure for JBOD

2017-04-05 Thread Becket Qin
+1

Thanks for the KIP. Made a pass and had some minor change.

On Mon, Apr 3, 2017 at 3:16 PM, radai  wrote:

> +1, LGTM
>
> On Mon, Apr 3, 2017 at 9:49 AM, Dong Lin  wrote:
>
> > Hi all,
> >
> > It seems that there is no further concern with the KIP-112. We would like
> > to start the voting process. The KIP can be found at
> > *https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 112%3A+Handle+disk+failure+for+JBOD
> >  > 112%3A+Handle+disk+failure+for+JBOD>.*
> >
> > Thanks,
> > Dong
> >
>


[jira] [Commented] (KAFKA-5007) Kafka Replica Fetcher Thread- Resource Leak

2017-04-05 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15958124#comment-15958124
 ] 

Jun Rao commented on KAFKA-5007:


[~joseph.alias...@gmail.com], thanks for reporting this. If the replica fetch 
thread hits an IOException, we always close the connection before creating a 
new one. If you do netstat, do those leaked socket show as in close_wait state?

> Kafka Replica Fetcher Thread- Resource Leak
> ---
>
> Key: KAFKA-5007
> URL: https://issues.apache.org/jira/browse/KAFKA-5007
> Project: Kafka
>  Issue Type: Bug
>  Components: core, network
>Affects Versions: 0.10.1.1
> Environment: Centos 7
> Jave 8
>Reporter: Joseph Aliase
>
> Kafka is running out of open file descriptor when system network interface is 
> done.
> Issue description:
> We have a Kafka Cluster of 5 node running on version 0.10.1.1. The open file 
> descriptor for the account running Kafka is set to 10.
> During an upgrade, network interface went down. Outage continued for 12 hours 
> eventually all the broker crashed with java.io.IOException: Too many open 
> files error.
> We repeated the test in a lower environment and observed that Open Socket 
> count keeps on increasing while the NIC is down.
> We have around 13 topics with max partition size of 120 and number of replica 
> fetcher thread is set to 8.
> Using an internal monitoring tool we observed that Open Socket descriptor   
> for the broker pid continued to increase although NIC was down leading to  
> Open File descriptor error. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KAFKA-5019) Exactly-once upgrade notes

2017-04-05 Thread Ismael Juma (JIRA)
Ismael Juma created KAFKA-5019:
--

 Summary: Exactly-once upgrade notes
 Key: KAFKA-5019
 URL: https://issues.apache.org/jira/browse/KAFKA-5019
 Project: Kafka
  Issue Type: Sub-task
Reporter: Ismael Juma
Priority: Critical
 Fix For: 0.11.0.0


We have added some basic upgrade notes, but we need to flesh them out. We 
should cover every item that has compatibility implications as well new and 
updated protocol APIs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KAFKA-5020) Update protocol documentation to mention message format v2

2017-04-05 Thread Ismael Juma (JIRA)
Ismael Juma created KAFKA-5020:
--

 Summary: Update protocol documentation to mention message format v2
 Key: KAFKA-5020
 URL: https://issues.apache.org/jira/browse/KAFKA-5020
 Project: Kafka
  Issue Type: Sub-task
Reporter: Ismael Juma
 Fix For: 0.11.0.0


Sections 5.3, 5.4 and 5.5 should be updated:

https://kafka.apache.org/documentation/#messages

We may want to mention record batches along with message sets here:

https://kafka.apache.org/protocol#protocol_message_sets

And we should update the wiki page linked from the protocol documentation:

https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-Messagesets




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KAFKA-5021) Update Message Delivery Semantics section to take into account KIP-98

2017-04-05 Thread Ismael Juma (JIRA)
Ismael Juma created KAFKA-5021:
--

 Summary: Update Message Delivery Semantics section to take into 
account KIP-98
 Key: KAFKA-5021
 URL: https://issues.apache.org/jira/browse/KAFKA-5021
 Project: Kafka
  Issue Type: Sub-task
Reporter: Ismael Juma
 Fix For: 0.11.0.0


Reference:

https://kafka.apache.org/documentation/#semantics



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KAFKA-5022) Improve CRC tests so that we verify which fields are included in the CRC

2017-04-05 Thread Ismael Juma (JIRA)
Ismael Juma created KAFKA-5022:
--

 Summary: Improve CRC tests so that we verify which fields are 
included in the CRC
 Key: KAFKA-5022
 URL: https://issues.apache.org/jira/browse/KAFKA-5022
 Project: Kafka
  Issue Type: Sub-task
Reporter: Ismael Juma


It would be good to have a test that updates each field in the record/record 
batch, recomputes the CRC and then asserts that the CRC changes or not 
depending on whether that field is meant to be part of the CRC or not.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KAFKA-5023) Remove cyclic dependency between record and protocol packages

2017-04-05 Thread Ismael Juma (JIRA)
Ismael Juma created KAFKA-5023:
--

 Summary: Remove cyclic dependency between record and protocol 
packages
 Key: KAFKA-5023
 URL: https://issues.apache.org/jira/browse/KAFKA-5023
 Project: Kafka
  Issue Type: Sub-task
Reporter: Ismael Juma


Dependencies should flow in one direction. We currently have Struct under the 
protocol.types package and Records under the record package.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KAFKA-5024) Old clients don't support message format V2

2017-04-05 Thread Ismael Juma (JIRA)
Ismael Juma created KAFKA-5024:
--

 Summary: Old clients don't support message format V2
 Key: KAFKA-5024
 URL: https://issues.apache.org/jira/browse/KAFKA-5024
 Project: Kafka
  Issue Type: Sub-task
Reporter: Ismael Juma
 Fix For: 0.11.0.0


Is this OK? If so, we can close this JIRA, but we should make that decision 
consciously.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #2814: change language level from java 7 to 8

2017-04-05 Thread radai-rosenblatt
Github user radai-rosenblatt closed the pull request at:

https://github.com/apache/kafka/pull/2814


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (KAFKA-5010) Log cleaner crashed with BufferOverflowException when writing to the writeBuffer

2017-04-05 Thread Shuai Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shuai Lin updated KAFKA-5010:
-
Description: 
After upgrading from 0.10.0.1 to 0.10.2.0 the log cleaner thread crashed with 
BufferOverflowException when writing the filtered records into the writeBuffer:

{code}
[2017-03-24 10:41:03,926] INFO [kafka-log-cleaner-thread-0], Starting  
(kafka.log.LogCleaner)
[2017-03-24 10:41:04,177] INFO Cleaner 0: Beginning cleaning of log 
app-topic-20170317-20. (kafka.log.LogCleaner)
[2017-03-24 10:41:04,177] INFO Cleaner 0: Building offset map for 
app-topic-20170317-20... (kafka.log.LogCleaner)
[2017-03-24 10:41:04,387] INFO Cleaner 0: Building offset map for log 
app-topic-20170317-20 for 1 segments in offset range [9737795, 9887707). 
(kafka.log.LogCleaner)
[2017-03-24 10:41:07,101] INFO Cleaner 0: Offset map for log 
app-topic-20170317-20 complete. (kafka.log.LogCleaner)
[2017-03-24 10:41:07,106] INFO Cleaner 0: Cleaning log app-topic-20170317-20 
(cleaning prior to Fri Mar 24 10:36:06 GMT 2017, discarding tombstones prior to 
Thu Mar 23 10:18:02 GMT 2017)... (kafka.log.LogCleaner)
[2017-03-24 10:41:07,110] INFO Cleaner 0: Cleaning segment 0 in log 
app-topic-20170317-20 (largest timestamp Fri Mar 24 09:58:25 GMT 2017) into 0, 
retaining deletes. (kafka.log.LogCleaner)
[2017-03-24 10:41:07,372] ERROR [kafka-log-cleaner-thread-0], Error due to  
(kafka.log.LogCleaner)
java.nio.BufferOverflowException
at java.nio.HeapByteBuffer.put(HeapByteBuffer.java:206)
at org.apache.kafka.common.record.LogEntry.writeTo(LogEntry.java:98)
at 
org.apache.kafka.common.record.MemoryRecords.filterTo(MemoryRecords.java:158)
at 
org.apache.kafka.common.record.MemoryRecords.filterTo(MemoryRecords.java:111)
at kafka.log.Cleaner.cleanInto(LogCleaner.scala:468)
at kafka.log.Cleaner.$anonfun$cleanSegments$1(LogCleaner.scala:405)
at 
kafka.log.Cleaner.$anonfun$cleanSegments$1$adapted(LogCleaner.scala:401)
at scala.collection.immutable.List.foreach(List.scala:378)
at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:401)
at kafka.log.Cleaner.$anonfun$clean$6(LogCleaner.scala:363)
at kafka.log.Cleaner.$anonfun$clean$6$adapted(LogCleaner.scala:362)
at scala.collection.immutable.List.foreach(List.scala:378)
at kafka.log.Cleaner.clean(LogCleaner.scala:362)
at kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:241)
at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:220)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
[2017-03-24 10:41:07,375] INFO [kafka-log-cleaner-thread-0], Stopped  
(kafka.log.LogCleaner)
{code}

I tried different values of log.cleaner.buffer.size, from 512K to 2M to 10M to 
128M, all with no luck: The log cleaner thread crashed immediately after the 
broker got restarted. But setting it to 256MB fixed the problem!

Here are the settings for the cluster:
{code}
- log.message.format.version = 0.9.0.0 (we use 0.9 format because have old 
consumers)
- log.cleaner.enable = 'true'
- log.cleaner.min.cleanable.ratio = '0.1'
- log.cleaner.threads = '1'
- log.cleaner.io.buffer.load.factor = '0.98'
- log.roll.hours = '24'
- log.cleaner.dedupe.buffer.size = 2GB 
- log.segment.bytes = 256MB (global is 512MB, but we have been using 256MB for 
this topic)
- message.max.bytes = 10MB
{code}

Given that the size of readBuffer and writeBuffer are exactly the same (half of 
log.cleaner.io.buffer.size), why would the cleaner throw a 
BufferOverflowException when writing the filtered records into the writeBuffer? 
IIUC that should never happen because the size of the filtered records should 
be no greater than the size of the readBuffer, thus no greater than the size of 
the writeBuffer.



  was:
After upgrading from 0.10.0.1 to 0.10.2.0 the log cleaner thread crashed with 
BufferOverflowException when writing the filtered records into the writeBuffer:

{code}
[2017-03-24 10:41:03,926] INFO [kafka-log-cleaner-thread-0], Starting  
(kafka.log.LogCleaner)
[2017-03-24 10:41:04,177] INFO Cleaner 0: Beginning cleaning of log 
app-topic-20170317-20. (kafka.log.LogCleaner)
[2017-03-24 10:41:04,177] INFO Cleaner 0: Building offset map for 
app-topic-20170317-20... (kafka.log.LogCleaner)
[2017-03-24 10:41:04,387] INFO Cleaner 0: Building offset map for log 
app-topic-20170317-20 for 1 segments in offset range [9737795, 9887707). 
(kafka.log.LogCleaner)
[2017-03-24 10:41:07,101] INFO Cleaner 0: Offset map for log 
app-topic-20170317-20 complete. (kafka.log.LogCleaner)
[2017-03-24 10:41:07,106] INFO Cleaner 0: Cleaning log app-topic-20170317-20 
(cleaning prior to Fri Mar 24 10:36:06 GMT 2017, discarding tombstones prior to 
Thu Mar 23 10:18:02 GMT 2017)... (kafka.log.LogCleaner)
[2017-03-24 10:41:07,110] INFO Cleaner 0: Cleaning segment 0 in log 
app-topic-20170317-20 (largest timesta

[jira] [Created] (KAFKA-5025) FetchRequestTest should use batches with more than one message

2017-04-05 Thread Ismael Juma (JIRA)
Ismael Juma created KAFKA-5025:
--

 Summary: FetchRequestTest should use batches with more than one 
message
 Key: KAFKA-5025
 URL: https://issues.apache.org/jira/browse/KAFKA-5025
 Project: Kafka
  Issue Type: Sub-task
Reporter: Ismael Juma


As part of the message format changes for KIP-98, FetchRequestTest.produceData 
was changed to always use record batches containing a single message. We should 
restructure the test so that it's more realistic. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5010) Log cleaner crashed with BufferOverflowException when writing to the writeBuffer

2017-04-05 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15958175#comment-15958175
 ] 

Ismael Juma commented on KAFKA-5010:


Was compression enabled?

> Log cleaner crashed with BufferOverflowException when writing to the 
> writeBuffer
> 
>
> Key: KAFKA-5010
> URL: https://issues.apache.org/jira/browse/KAFKA-5010
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.2.0
>Reporter: Shuai Lin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.11.0.0
>
>
> After upgrading from 0.10.0.1 to 0.10.2.0 the log cleaner thread crashed with 
> BufferOverflowException when writing the filtered records into the 
> writeBuffer:
> {code}
> [2017-03-24 10:41:03,926] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2017-03-24 10:41:04,177] INFO Cleaner 0: Beginning cleaning of log 
> app-topic-20170317-20. (kafka.log.LogCleaner)
> [2017-03-24 10:41:04,177] INFO Cleaner 0: Building offset map for 
> app-topic-20170317-20... (kafka.log.LogCleaner)
> [2017-03-24 10:41:04,387] INFO Cleaner 0: Building offset map for log 
> app-topic-20170317-20 for 1 segments in offset range [9737795, 9887707). 
> (kafka.log.LogCleaner)
> [2017-03-24 10:41:07,101] INFO Cleaner 0: Offset map for log 
> app-topic-20170317-20 complete. (kafka.log.LogCleaner)
> [2017-03-24 10:41:07,106] INFO Cleaner 0: Cleaning log app-topic-20170317-20 
> (cleaning prior to Fri Mar 24 10:36:06 GMT 2017, discarding tombstones prior 
> to Thu Mar 23 10:18:02 GMT 2017)... (kafka.log.LogCleaner)
> [2017-03-24 10:41:07,110] INFO Cleaner 0: Cleaning segment 0 in log 
> app-topic-20170317-20 (largest timestamp Fri Mar 24 09:58:25 GMT 2017) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2017-03-24 10:41:07,372] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> java.nio.BufferOverflowException
> at java.nio.HeapByteBuffer.put(HeapByteBuffer.java:206)
> at org.apache.kafka.common.record.LogEntry.writeTo(LogEntry.java:98)
> at 
> org.apache.kafka.common.record.MemoryRecords.filterTo(MemoryRecords.java:158)
> at 
> org.apache.kafka.common.record.MemoryRecords.filterTo(MemoryRecords.java:111)
> at kafka.log.Cleaner.cleanInto(LogCleaner.scala:468)
> at kafka.log.Cleaner.$anonfun$cleanSegments$1(LogCleaner.scala:405)
> at 
> kafka.log.Cleaner.$anonfun$cleanSegments$1$adapted(LogCleaner.scala:401)
> at scala.collection.immutable.List.foreach(List.scala:378)
> at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:401)
> at kafka.log.Cleaner.$anonfun$clean$6(LogCleaner.scala:363)
> at kafka.log.Cleaner.$anonfun$clean$6$adapted(LogCleaner.scala:362)
> at scala.collection.immutable.List.foreach(List.scala:378)
> at kafka.log.Cleaner.clean(LogCleaner.scala:362)
> at 
> kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:241)
> at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:220)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
> [2017-03-24 10:41:07,375] INFO [kafka-log-cleaner-thread-0], Stopped  
> (kafka.log.LogCleaner)
> {code}
> I tried different values of log.cleaner.buffer.size, from 512K to 2M to 10M 
> to 128M, all with no luck: The log cleaner thread crashed immediately after 
> the broker got restarted. But setting it to 256MB fixed the problem!
> Here are the settings for the cluster:
> {code}
> - log.message.format.version = 0.9.0.0 (we use 0.9 format because have old 
> consumers)
> - log.cleaner.enable = 'true'
> - log.cleaner.min.cleanable.ratio = '0.1'
> - log.cleaner.threads = '1'
> - log.cleaner.io.buffer.load.factor = '0.98'
> - log.roll.hours = '24'
> - log.cleaner.dedupe.buffer.size = 2GB 
> - log.segment.bytes = 256MB (global is 512MB, but we have been using 256MB 
> for this topic)
> - message.max.bytes = 10MB
> {code}
> Given that the size of readBuffer and writeBuffer are exactly the same (half 
> of log.cleaner.io.buffer.size), why would the cleaner throw a 
> BufferOverflowException when writing the filtered records into the 
> writeBuffer? IIUC that should never happen because the size of the filtered 
> records should be no greater than the size of the readBuffer, thus no greater 
> than the size of the writeBuffer.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KAFKA-5026) DebuggingConsumerId and DebuggingMessageFormatter and message format v2

2017-04-05 Thread Ismael Juma (JIRA)
Ismael Juma created KAFKA-5026:
--

 Summary: DebuggingConsumerId and DebuggingMessageFormatter and 
message format v2
 Key: KAFKA-5026
 URL: https://issues.apache.org/jira/browse/KAFKA-5026
 Project: Kafka
  Issue Type: Sub-task
Reporter: Ismael Juma
 Fix For: 0.11.0.0


[~junrao] suggested the following:

Currently, the broker supports a DebuggingConsumerId mode for the fetch 
request. Should we extend that so that the consumer can read the control 
message as well? Should we also have some kind of DebuggingMessageFormatter so 
that ConsoleConsumer can show all the newly introduced fields in the new 
message format (e.g., pid, epoch, etc) for debugging purpose?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5010) Log cleaner crashed with BufferOverflowException when writing to the writeBuffer

2017-04-05 Thread Shuai Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15958201#comment-15958201
 ] 

Shuai Lin commented on KAFKA-5010:
--

Yeah, from the output of the {{DumpLogSegments}} tool, the topic that triggered 
the crash of the log cleaner is using snappy:

{code}
$ bin/kafka-run-class.sh kafka.tools.DumpLogSegments --files 
/data/0/app-topic-20170317-0/000108989705.log
...   
offset: 109076700 position: 8395364 NoTimestampType: -1 isvalid: true 
payloadsize: 827 magic: 0 compresscodec: SNAPPY crc: 4084921916
offset: 109076711 position: 8396217 NoTimestampType: -1 isvalid: true 
payloadsize: 1113 magic: 0 compresscodec: SNAPPY crc: 268690509
...
{code}

> Log cleaner crashed with BufferOverflowException when writing to the 
> writeBuffer
> 
>
> Key: KAFKA-5010
> URL: https://issues.apache.org/jira/browse/KAFKA-5010
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.2.0
>Reporter: Shuai Lin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.11.0.0
>
>
> After upgrading from 0.10.0.1 to 0.10.2.0 the log cleaner thread crashed with 
> BufferOverflowException when writing the filtered records into the 
> writeBuffer:
> {code}
> [2017-03-24 10:41:03,926] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2017-03-24 10:41:04,177] INFO Cleaner 0: Beginning cleaning of log 
> app-topic-20170317-20. (kafka.log.LogCleaner)
> [2017-03-24 10:41:04,177] INFO Cleaner 0: Building offset map for 
> app-topic-20170317-20... (kafka.log.LogCleaner)
> [2017-03-24 10:41:04,387] INFO Cleaner 0: Building offset map for log 
> app-topic-20170317-20 for 1 segments in offset range [9737795, 9887707). 
> (kafka.log.LogCleaner)
> [2017-03-24 10:41:07,101] INFO Cleaner 0: Offset map for log 
> app-topic-20170317-20 complete. (kafka.log.LogCleaner)
> [2017-03-24 10:41:07,106] INFO Cleaner 0: Cleaning log app-topic-20170317-20 
> (cleaning prior to Fri Mar 24 10:36:06 GMT 2017, discarding tombstones prior 
> to Thu Mar 23 10:18:02 GMT 2017)... (kafka.log.LogCleaner)
> [2017-03-24 10:41:07,110] INFO Cleaner 0: Cleaning segment 0 in log 
> app-topic-20170317-20 (largest timestamp Fri Mar 24 09:58:25 GMT 2017) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2017-03-24 10:41:07,372] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> java.nio.BufferOverflowException
> at java.nio.HeapByteBuffer.put(HeapByteBuffer.java:206)
> at org.apache.kafka.common.record.LogEntry.writeTo(LogEntry.java:98)
> at 
> org.apache.kafka.common.record.MemoryRecords.filterTo(MemoryRecords.java:158)
> at 
> org.apache.kafka.common.record.MemoryRecords.filterTo(MemoryRecords.java:111)
> at kafka.log.Cleaner.cleanInto(LogCleaner.scala:468)
> at kafka.log.Cleaner.$anonfun$cleanSegments$1(LogCleaner.scala:405)
> at 
> kafka.log.Cleaner.$anonfun$cleanSegments$1$adapted(LogCleaner.scala:401)
> at scala.collection.immutable.List.foreach(List.scala:378)
> at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:401)
> at kafka.log.Cleaner.$anonfun$clean$6(LogCleaner.scala:363)
> at kafka.log.Cleaner.$anonfun$clean$6$adapted(LogCleaner.scala:362)
> at scala.collection.immutable.List.foreach(List.scala:378)
> at kafka.log.Cleaner.clean(LogCleaner.scala:362)
> at 
> kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:241)
> at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:220)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
> [2017-03-24 10:41:07,375] INFO [kafka-log-cleaner-thread-0], Stopped  
> (kafka.log.LogCleaner)
> {code}
> I tried different values of log.cleaner.buffer.size, from 512K to 2M to 10M 
> to 128M, all with no luck: The log cleaner thread crashed immediately after 
> the broker got restarted. But setting it to 256MB fixed the problem!
> Here are the settings for the cluster:
> {code}
> - log.message.format.version = 0.9.0.0 (we use 0.9 format because have old 
> consumers)
> - log.cleaner.enable = 'true'
> - log.cleaner.min.cleanable.ratio = '0.1'
> - log.cleaner.threads = '1'
> - log.cleaner.io.buffer.load.factor = '0.98'
> - log.roll.hours = '24'
> - log.cleaner.dedupe.buffer.size = 2GB 
> - log.segment.bytes = 256MB (global is 512MB, but we have been using 256MB 
> for this topic)
> - message.max.bytes = 10MB
> {code}
> Given that the size of readBuffer and writeBuffer are exactly the same (half 
> of log.cleaner.io.buffer.size), why would the cleaner throw a 
> BufferOverflowException when writing the filtered records into the 
> writeBuffer? IIUC that should never happe

[jira] [Created] (KAFKA-5027) Kafka Controller Redesign

2017-04-05 Thread Onur Karaman (JIRA)
Onur Karaman created KAFKA-5027:
---

 Summary: Kafka Controller Redesign
 Key: KAFKA-5027
 URL: https://issues.apache.org/jira/browse/KAFKA-5027
 Project: Kafka
  Issue Type: Improvement
Reporter: Onur Karaman
Assignee: Onur Karaman


The goal of this redesign is to improve controller performance, controller 
maintainability, and cluster reliability.

Documentation regarding what's being considered can be found 
[here|https://docs.google.com/document/d/1rLDmzDOGQQeSiMANP0rC2RYp_L7nUGHzFD9MQISgXYM].



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KAFKA-5028) Convert Kafka Controller to a single-threaded event queue model

2017-04-05 Thread Onur Karaman (JIRA)
Onur Karaman created KAFKA-5028:
---

 Summary: Convert Kafka Controller to a single-threaded event queue 
model
 Key: KAFKA-5028
 URL: https://issues.apache.org/jira/browse/KAFKA-5028
 Project: Kafka
  Issue Type: Sub-task
Reporter: Onur Karaman
Assignee: Onur Karaman


The goal of this ticket is to improve controller maintainability by simplifying 
the controller's concurrency semantics. The controller code has a lot of shared 
state between several threads using several concurrency primitives. This makes 
the code hard to reason about.

This ticket proposes we convert the controller to a single-threaded event queue 
model. We add a new controller thread which processes events held in an event 
queue. Note that this does not mean we get rid of all threads used by the 
controller. We merely delegate all work that interacts with controller local 
state to this single thread. With only a single thread accessing and modifying 
the controller local state, we no longer need to worry about concurrent access, 
which means we can get rid of the various concurrency primitives used 
throughout the controller.

Performance is expected to match existing behavior since the bulk of the 
existing controller work today already happens sequentially in the ZkClient’s 
single ZkEventThread.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-5028) convert kafka controller to a single-threaded event queue model

2017-04-05 Thread Onur Karaman (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Onur Karaman updated KAFKA-5028:

Summary: convert kafka controller to a single-threaded event queue model  
(was: Convert Kafka Controller to a single-threaded event queue model)

> convert kafka controller to a single-threaded event queue model
> ---
>
> Key: KAFKA-5028
> URL: https://issues.apache.org/jira/browse/KAFKA-5028
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Onur Karaman
>Assignee: Onur Karaman
>
> The goal of this ticket is to improve controller maintainability by 
> simplifying the controller's concurrency semantics. The controller code has a 
> lot of shared state between several threads using several concurrency 
> primitives. This makes the code hard to reason about.
> This ticket proposes we convert the controller to a single-threaded event 
> queue model. We add a new controller thread which processes events held in an 
> event queue. Note that this does not mean we get rid of all threads used by 
> the controller. We merely delegate all work that interacts with controller 
> local state to this single thread. With only a single thread accessing and 
> modifying the controller local state, we no longer need to worry about 
> concurrent access, which means we can get rid of the various concurrency 
> primitives used throughout the controller.
> Performance is expected to match existing behavior since the bulk of the 
> existing controller work today already happens sequentially in the ZkClient’s 
> single ZkEventThread.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5028) convert kafka controller to a single-threaded event queue model

2017-04-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15958317#comment-15958317
 ] 

ASF GitHub Bot commented on KAFKA-5028:
---

GitHub user onurkaraman opened a pull request:

https://github.com/apache/kafka/pull/2816

KAFKA-5028: convert kafka controller to a single-threaded event queue model

The goal of this ticket is to improve controller maintainability by 
simplifying the controller's concurrency semantics. The controller code has a 
lot of shared state between several threads using several concurrency 
primitives. This makes the code hard to reason about.

This ticket proposes we convert the controller to a single-threaded event 
queue model. We add a new controller thread which processes events held in an 
event queue. Note that this does not mean we get rid of all threads used by the 
controller. We merely delegate all work that interacts with controller local 
state to this single thread. With only a single thread accessing and modifying 
the controller local state, we no longer need to worry about concurrent access, 
which means we can get rid of the various concurrency primitives used 
throughout the controller.

Performance is expected to match existing behavior since the bulk of the 
existing controller work today already happens sequentially in the ZkClient’s 
single ZkEventThread.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/onurkaraman/kafka KAFKA-5028

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2816.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2816


commit d74cad8874f1a9dd2b781c3033921f562d4e7630
Author: Onur Karaman 
Date:   2017-04-06T05:20:23Z

KAFKA-5028: convert kafka controller to a single-threaded event queue model

The goal of this ticket is to improve controller maintainability by 
simplifying the controller's concurrency semantics. The controller code has a 
lot of shared state between several threads using several concurrency 
primitives. This makes the code hard to reason about.

This ticket proposes we convert the controller to a single-threaded event 
queue model. We add a new controller thread which processes events held in an 
event queue. Note that this does not mean we get rid of all threads used by the 
controller. We merely delegate all work that interacts with controller local 
state to this single thread. With only a single thread accessing and modifying 
the controller local state, we no longer need to worry about concurrent access, 
which means we can get rid of the various concurrency primitives used 
throughout the controller.

Performance is expected to match existing behavior since the bulk of the 
existing controller work today already happens sequentially in the ZkClient’s 
single ZkEventThread.




> convert kafka controller to a single-threaded event queue model
> ---
>
> Key: KAFKA-5028
> URL: https://issues.apache.org/jira/browse/KAFKA-5028
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Onur Karaman
>Assignee: Onur Karaman
>
> The goal of this ticket is to improve controller maintainability by 
> simplifying the controller's concurrency semantics. The controller code has a 
> lot of shared state between several threads using several concurrency 
> primitives. This makes the code hard to reason about.
> This ticket proposes we convert the controller to a single-threaded event 
> queue model. We add a new controller thread which processes events held in an 
> event queue. Note that this does not mean we get rid of all threads used by 
> the controller. We merely delegate all work that interacts with controller 
> local state to this single thread. With only a single thread accessing and 
> modifying the controller local state, we no longer need to worry about 
> concurrent access, which means we can get rid of the various concurrency 
> primitives used throughout the controller.
> Performance is expected to match existing behavior since the bulk of the 
> existing controller work today already happens sequentially in the ZkClient’s 
> single ZkEventThread.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #2816: KAFKA-5028: convert kafka controller to a single-t...

2017-04-05 Thread onurkaraman
GitHub user onurkaraman opened a pull request:

https://github.com/apache/kafka/pull/2816

KAFKA-5028: convert kafka controller to a single-threaded event queue model

The goal of this ticket is to improve controller maintainability by 
simplifying the controller's concurrency semantics. The controller code has a 
lot of shared state between several threads using several concurrency 
primitives. This makes the code hard to reason about.

This ticket proposes we convert the controller to a single-threaded event 
queue model. We add a new controller thread which processes events held in an 
event queue. Note that this does not mean we get rid of all threads used by the 
controller. We merely delegate all work that interacts with controller local 
state to this single thread. With only a single thread accessing and modifying 
the controller local state, we no longer need to worry about concurrent access, 
which means we can get rid of the various concurrency primitives used 
throughout the controller.

Performance is expected to match existing behavior since the bulk of the 
existing controller work today already happens sequentially in the ZkClient’s 
single ZkEventThread.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/onurkaraman/kafka KAFKA-5028

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2816.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2816


commit d74cad8874f1a9dd2b781c3033921f562d4e7630
Author: Onur Karaman 
Date:   2017-04-06T05:20:23Z

KAFKA-5028: convert kafka controller to a single-threaded event queue model

The goal of this ticket is to improve controller maintainability by 
simplifying the controller's concurrency semantics. The controller code has a 
lot of shared state between several threads using several concurrency 
primitives. This makes the code hard to reason about.

This ticket proposes we convert the controller to a single-threaded event 
queue model. We add a new controller thread which processes events held in an 
event queue. Note that this does not mean we get rid of all threads used by the 
controller. We merely delegate all work that interacts with controller local 
state to this single thread. With only a single thread accessing and modifying 
the controller local state, we no longer need to worry about concurrent access, 
which means we can get rid of the various concurrency primitives used 
throughout the controller.

Performance is expected to match existing behavior since the bulk of the 
existing controller work today already happens sequentially in the ZkClient’s 
single ZkEventThread.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Work started] (KAFKA-5028) convert kafka controller to a single-threaded event queue model

2017-04-05 Thread Onur Karaman (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on KAFKA-5028 started by Onur Karaman.
---
> convert kafka controller to a single-threaded event queue model
> ---
>
> Key: KAFKA-5028
> URL: https://issues.apache.org/jira/browse/KAFKA-5028
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Onur Karaman
>Assignee: Onur Karaman
>
> The goal of this ticket is to improve controller maintainability by 
> simplifying the controller's concurrency semantics. The controller code has a 
> lot of shared state between several threads using several concurrency 
> primitives. This makes the code hard to reason about.
> This ticket proposes we convert the controller to a single-threaded event 
> queue model. We add a new controller thread which processes events held in an 
> event queue. Note that this does not mean we get rid of all threads used by 
> the controller. We merely delegate all work that interacts with controller 
> local state to this single thread. With only a single thread accessing and 
> modifying the controller local state, we no longer need to worry about 
> concurrent access, which means we can get rid of the various concurrency 
> primitives used throughout the controller.
> Performance is expected to match existing behavior since the bulk of the 
> existing controller work today already happens sequentially in the ZkClient’s 
> single ZkEventThread.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KAFKA-5029) cleanup javadocs and logging

2017-04-05 Thread Onur Karaman (JIRA)
Onur Karaman created KAFKA-5029:
---

 Summary: cleanup javadocs and logging
 Key: KAFKA-5029
 URL: https://issues.apache.org/jira/browse/KAFKA-5029
 Project: Kafka
  Issue Type: Sub-task
Reporter: Onur Karaman
Assignee: Onur Karaman






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)