[jira] [Created] (KAFKA-4556) unordered messages when multiple topics are combined in single topic through stream

2016-12-19 Thread Savdeep Singh (JIRA)
Savdeep Singh created KAFKA-4556:


 Summary: unordered messages when multiple topics are combined in 
single topic through stream
 Key: KAFKA-4556
 URL: https://issues.apache.org/jira/browse/KAFKA-4556
 Project: Kafka
  Issue Type: Bug
Reporter: Savdeep Singh






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #2275: Fix exception handling in case of file record trun...

2016-12-19 Thread kamilszymanski
GitHub user kamilszymanski opened a pull request:

https://github.com/apache/kafka/pull/2275

Fix exception handling in case of file record truncation during write

In case of file record truncation during write due to improper types usage 
(`AtomicInteger` in place of `int`) `IllegalFormatConversionException` would be 
thrown instead of `KafkaException`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kamilszymanski/kafka 
file_record_truncation_during_write

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2275.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2275


commit 7107282d1686a66dd3d5e460c0bff50b96edf38e
Author: Kamil Szymanski 
Date:   2016-12-19T12:46:43Z

MINOR: Fix exception handling in case of file record truncation during write




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (KAFKA-4556) unordered messages when multiple topics are combined in single topic through stream

2016-12-19 Thread Savdeep Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Savdeep Singh updated KAFKA-4556:
-
Description: 
When binding builder with multiple topics, single resultant topic has unordered 
set of messages.
This issue is at millisecond level. When messages with same milisec

> unordered messages when multiple topics are combined in single topic through 
> stream
> ---
>
> Key: KAFKA-4556
> URL: https://issues.apache.org/jira/browse/KAFKA-4556
> Project: Kafka
>  Issue Type: Bug
>Reporter: Savdeep Singh
>
> When binding builder with multiple topics, single resultant topic has 
> unordered set of messages.
> This issue is at millisecond level. When messages with same milisec



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4556) unordered messages when multiple topics are combined in single topic through stream

2016-12-19 Thread Savdeep Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Savdeep Singh updated KAFKA-4556:
-
Description: 
When binding builder with multiple topics, single resultant topic has unordered 
set of messages.
This issue is at millisecond level. When messages with same milisecond level 
are added in topics.

Scenario :  (1 producer : p1 , 2 topics t1 and t2, streams pick form these 2 
topics and save in resulting t3 topic, 4 partitions of t3 and 4 consumers of 4 
partitions )

Case: When p1 adds messages with same millisecond timestamp

  was:
When binding builder with multiple topics, single resultant topic has unordered 
set of messages.
This issue is at millisecond level. When messages with same milisec


> unordered messages when multiple topics are combined in single topic through 
> stream
> ---
>
> Key: KAFKA-4556
> URL: https://issues.apache.org/jira/browse/KAFKA-4556
> Project: Kafka
>  Issue Type: Bug
>Reporter: Savdeep Singh
>
> When binding builder with multiple topics, single resultant topic has 
> unordered set of messages.
> This issue is at millisecond level. When messages with same milisecond level 
> are added in topics.
> Scenario :  (1 producer : p1 , 2 topics t1 and t2, streams pick form these 2 
> topics and save in resulting t3 topic, 4 partitions of t3 and 4 consumers of 
> 4 partitions )
> Case: When p1 adds messages with same millisecond timestamp



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4556) unordered messages when multiple topics are combined in single topic through stream

2016-12-19 Thread Savdeep Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Savdeep Singh updated KAFKA-4556:
-
Description: 
When binding builder with multiple topics, single resultant topic has unordered 
set of messages.
This issue is at millisecond level. When messages with same milisecond level 
are added in topics.

Scenario :  (1 producer : p1 , 2 topics t1 and t2, streams pick form these 2 
topics and save in resulting t3 topic, 4 partitions of t3 and 4 consumers of 4 
partitions )

Case: When p1 adds messages with same millisecond timestamp into t1 and t2 . 
Stream combine and form 

  was:
When binding builder with multiple topics, single resultant topic has unordered 
set of messages.
This issue is at millisecond level. When messages with same milisecond level 
are added in topics.

Scenario :  (1 producer : p1 , 2 topics t1 and t2, streams pick form these 2 
topics and save in resulting t3 topic, 4 partitions of t3 and 4 consumers of 4 
partitions )

Case: When p1 adds messages with same millisecond timestamp


> unordered messages when multiple topics are combined in single topic through 
> stream
> ---
>
> Key: KAFKA-4556
> URL: https://issues.apache.org/jira/browse/KAFKA-4556
> Project: Kafka
>  Issue Type: Bug
>Reporter: Savdeep Singh
>
> When binding builder with multiple topics, single resultant topic has 
> unordered set of messages.
> This issue is at millisecond level. When messages with same milisecond level 
> are added in topics.
> Scenario :  (1 producer : p1 , 2 topics t1 and t2, streams pick form these 2 
> topics and save in resulting t3 topic, 4 partitions of t3 and 4 consumers of 
> 4 partitions )
> Case: When p1 adds messages with same millisecond timestamp into t1 and t2 . 
> Stream combine and form 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4556) unordered messages when multiple topics are combined in single topic through stream

2016-12-19 Thread Savdeep Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Savdeep Singh updated KAFKA-4556:
-
Description: 
When binding builder with multiple topics, single resultant topic has unordered 
set of messages.
This issue is at millisecond level. When messages with same milisecond level 
are added in topics.

Scenario :  (1 producer : p1 , 2 topics t1 and t2, streams pick form these 2 
topics and save in resulting t3 topic, 4 partitions of t3 and 4 consumers of 4 
partitions )

Case: When p1 adds messages with same millisecond timestamp into t1 and t2 . 
Stream combine and form t3. When this t3 is consumed by consumer, it has 
different order of same millisecond messages.


  was:
When binding builder with multiple topics, single resultant topic has unordered 
set of messages.
This issue is at millisecond level. When messages with same milisecond level 
are added in topics.

Scenario :  (1 producer : p1 , 2 topics t1 and t2, streams pick form these 2 
topics and save in resulting t3 topic, 4 partitions of t3 and 4 consumers of 4 
partitions )

Case: When p1 adds messages with same millisecond timestamp into t1 and t2 . 
Stream combine and form 


> unordered messages when multiple topics are combined in single topic through 
> stream
> ---
>
> Key: KAFKA-4556
> URL: https://issues.apache.org/jira/browse/KAFKA-4556
> Project: Kafka
>  Issue Type: Bug
>Reporter: Savdeep Singh
>
> When binding builder with multiple topics, single resultant topic has 
> unordered set of messages.
> This issue is at millisecond level. When messages with same milisecond level 
> are added in topics.
> Scenario :  (1 producer : p1 , 2 topics t1 and t2, streams pick form these 2 
> topics and save in resulting t3 topic, 4 partitions of t3 and 4 consumers of 
> 4 partitions )
> Case: When p1 adds messages with same millisecond timestamp into t1 and t2 . 
> Stream combine and form t3. When this t3 is consumed by consumer, it has 
> different order of same millisecond messages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4556) unordered messages when multiple topics are combined in single topic through stream

2016-12-19 Thread Savdeep Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Savdeep Singh updated KAFKA-4556:
-
Affects Version/s: 0.10.0.1
  Component/s: streams
   producer 
   consumer

> unordered messages when multiple topics are combined in single topic through 
> stream
> ---
>
> Key: KAFKA-4556
> URL: https://issues.apache.org/jira/browse/KAFKA-4556
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer, producer , streams
>Affects Versions: 0.10.0.1
>Reporter: Savdeep Singh
>
> When binding builder with multiple topics, single resultant topic has 
> unordered set of messages.
> This issue is at millisecond level. When messages with same milisecond level 
> are added in topics.
> Scenario :  (1 producer : p1 , 2 topics t1 and t2, streams pick form these 2 
> topics and save in resulting t3 topic, 4 partitions of t3 and 4 consumers of 
> 4 partitions )
> Case: When p1 adds messages with same millisecond timestamp into t1 and t2 . 
> Stream combine and form t3. When this t3 is consumed by consumer, it has 
> different order of same millisecond messages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #2274: Implement topic config for internal topics

2016-12-19 Thread sjmittal
Github user sjmittal closed the pull request at:

https://github.com/apache/kafka/pull/2274


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (KAFKA-4540) Suspended tasks that are not assigned to the StreamThread need to be closed before new active and standby tasks are created

2016-12-19 Thread Damian Guy (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damian Guy updated KAFKA-4540:
--
Status: Patch Available  (was: Open)

> Suspended tasks that are not assigned to the StreamThread need to be closed 
> before new active and standby tasks are created
> ---
>
> Key: KAFKA-4540
> URL: https://issues.apache.org/jira/browse/KAFKA-4540
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.0
>Reporter: Damian Guy
>Assignee: Damian Guy
> Fix For: 0.10.2.0
>
>
> When partition assignment happens we first try and add the active tasks and 
> then add the standby tasks. The problem with this is that a new active task 
> might already be an existing suspended standby task. if this is the case then 
> when the active task initialises it will throw an exception from RocksDB:
> {{Caused by: org.rocksdb.RocksDBException: IO error: lock 
> /tmp/kafka-streams-7071/kafka-music-charts/1_1/rocksdb/all-songs/LOCK: No 
> locks available}}
> We need to make sure we have removed an closed any no-longer assigned 
> Suspended tasks before creating new tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4557) ConcurrentModificationException in KafkaProducer event loop

2016-12-19 Thread Sergey Alaev (JIRA)
Sergey Alaev created KAFKA-4557:
---

 Summary: ConcurrentModificationException in KafkaProducer event 
loop
 Key: KAFKA-4557
 URL: https://issues.apache.org/jira/browse/KAFKA-4557
 Project: Kafka
  Issue Type: Bug
  Components: clients
Affects Versions: 0.10.1.0
Reporter: Sergey Alaev


Under heavy load, Kafka producer can stop publishing events. Logs below.

[2016-12-19T15:01:28.956Z] [sgs] [kafka-producer-network-thread | producer-3] 
[KafkaProducerClient] [] [] [1B2M2Y8Asg] [WARN]: Error sending message to 
Kafka
  org.apache.kafka.common.errors.TimeoutException: Expiring 46 record(s) for 
events-deadletter-0 due to 30032 ms has passed since batch creation plus linger 
time (#285 from 2016-12-19
T15:01:28.793Z)
[2016-12-19T15:01:28.956Z] [sgs] [kafka-producer-network-thread | producer-3] 
[SgsService] [] [] [1B2M2Y8Asg] [WARN]: Error writing signal to Kafka 
deadletter queue
  org.apache.kafka.common.errors.TimeoutException: Expiring 46 record(s) for 
events-deadletter-0 due to 30032 ms has passed since batch creation plus linger 
time (#286 from 2016-12-19
T15:01:28.793Z)
[2016-12-19T15:01:28.960Z] [sgs] [kafka-producer-network-thread | producer-3] 
[Sender] [] [] [1B2M2Y8Asg] [ERROR]: Uncaught error in kafka producer I/O 
thread:
java.util.ConcurrentModificationException: null
at java.util.ArrayDeque$DeqIterator.next(ArrayDeque.java:643) 
~[na:1.8.0_45]
at 
org.apache.kafka.clients.producer.internals.RecordAccumulator.abortExpiredBatches(RecordAccumulator.java:242)
 ~[kafka-clients-0.10.1.0.jar:na]
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:212) 
~[kafka-clients-0.10.1.0.jar:na]
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:135) 
~[kafka-clients-0.10.1.0.jar:na]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
[2016-12-19T15:01:28.981Z] [sgs] [kafka-producer-network-thread | producer-3] 
[NetworkClient] [] [] [1B2M2Y8Asg] [WARN]: Error while fetching metadata 
with correlation id 28711 : {events-deadletter=LEADER_NOT_AVAILABLE}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4557) ConcurrentModificationException in KafkaProducer event loop

2016-12-19 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-4557:
---
Labels: reliability  (was: )

> ConcurrentModificationException in KafkaProducer event loop
> ---
>
> Key: KAFKA-4557
> URL: https://issues.apache.org/jira/browse/KAFKA-4557
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.10.1.0
>Reporter: Sergey Alaev
>  Labels: reliability
> Fix For: 0.10.2.0
>
>
> Under heavy load, Kafka producer can stop publishing events. Logs below.
> [2016-12-19T15:01:28.956Z] [sgs] [kafka-producer-network-thread | producer-3] 
> [KafkaProducerClient] [] [] [1B2M2Y8Asg] [WARN]: Error sending message 
> to Kafka
>   org.apache.kafka.common.errors.TimeoutException: Expiring 46 record(s) for 
> events-deadletter-0 due to 30032 ms has passed since batch creation plus 
> linger time (#285 from 2016-12-19
> T15:01:28.793Z)
> [2016-12-19T15:01:28.956Z] [sgs] [kafka-producer-network-thread | producer-3] 
> [SgsService] [] [] [1B2M2Y8Asg] [WARN]: Error writing signal to Kafka 
> deadletter queue
>   org.apache.kafka.common.errors.TimeoutException: Expiring 46 record(s) for 
> events-deadletter-0 due to 30032 ms has passed since batch creation plus 
> linger time (#286 from 2016-12-19
> T15:01:28.793Z)
> [2016-12-19T15:01:28.960Z] [sgs] [kafka-producer-network-thread | producer-3] 
> [Sender] [] [] [1B2M2Y8Asg] [ERROR]: Uncaught error in kafka producer 
> I/O thread:
> java.util.ConcurrentModificationException: null
> at java.util.ArrayDeque$DeqIterator.next(ArrayDeque.java:643) 
> ~[na:1.8.0_45]
> at 
> org.apache.kafka.clients.producer.internals.RecordAccumulator.abortExpiredBatches(RecordAccumulator.java:242)
>  ~[kafka-clients-0.10.1.0.jar:na]
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:212) 
> ~[kafka-clients-0.10.1.0.jar:na]
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:135) 
> ~[kafka-clients-0.10.1.0.jar:na]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
> [2016-12-19T15:01:28.981Z] [sgs] [kafka-producer-network-thread | producer-3] 
> [NetworkClient] [] [] [1B2M2Y8Asg] [WARN]: Error while fetching 
> metadata with correlation id 28711 : {events-deadletter=LEADER_NOT_AVAILABLE}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4557) ConcurrentModificationException in KafkaProducer event loop

2016-12-19 Thread Sergey Alaev (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Alaev updated KAFKA-4557:

Description: 
Under heavy load, Kafka producer can stop publishing events. Logs below.

[2016-12-19T15:01:28.779Z] [sgs] [kafka-producer-network-thread | producer-3] 
[NetworkClient] [] [] [] [DEBUG]: Disconnecting from node 2 due to 
request timeout.
[2016-12-19T15:01:28.793Z] [sgs] [kafka-producer-network-thread | producer-3] 
[KafkaProducerClient] [] [] [1B2M2Y8Asg] [WARN]: Error sending message to 
Kafka
org.apache.kafka.common.errors.NetworkException: The server disconnected before 
a response was received.
[2016-12-19T15:01:28.838Z] [sgs] [kafka-producer-network-thread | producer-3] 
[KafkaProducerClient] [] [] [1B2M2Y8Asg] [WARN]: Error sending message to 
Kafka
  org.apache.kafka.common.errors.NetworkException: The server disconnected 
before a response was received. (#2 from 2016-12-19T15:01:28.793Z)



[2016-12-19T15:01:28.956Z] [sgs] [kafka-producer-network-thread | producer-3] 
[KafkaProducerClient] [] [] [1B2M2Y8Asg] [WARN]: Error sending message to 
Kafka
  org.apache.kafka.common.errors.TimeoutException: Expiring 46 record(s) for 
events-deadletter-0 due to 30032 ms has passed since batch creation plus linger 
time (#285 from 2016-12-19
T15:01:28.793Z)
[2016-12-19T15:01:28.956Z] [sgs] [kafka-producer-network-thread | producer-3] 
[SgsService] [] [] [1B2M2Y8Asg] [WARN]: Error writing signal to Kafka 
deadletter queue
  org.apache.kafka.common.errors.TimeoutException: Expiring 46 record(s) for 
events-deadletter-0 due to 30032 ms has passed since batch creation plus linger 
time (#286 from 2016-12-19
T15:01:28.793Z)
[2016-12-19T15:01:28.960Z] [sgs] [kafka-producer-network-thread | producer-3] 
[Sender] [] [] [1B2M2Y8Asg] [ERROR]: Uncaught error in kafka producer I/O 
thread:
java.util.ConcurrentModificationException: null
at java.util.ArrayDeque$DeqIterator.next(ArrayDeque.java:643) 
~[na:1.8.0_45]
at 
org.apache.kafka.clients.producer.internals.RecordAccumulator.abortExpiredBatches(RecordAccumulator.java:242)
 ~[kafka-clients-0.10.1.0.jar:na]
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:212) 
~[kafka-clients-0.10.1.0.jar:na]
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:135) 
~[kafka-clients-0.10.1.0.jar:na]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
[2016-12-19T15:01:28.981Z] [sgs] [kafka-producer-network-thread | producer-3] 
[NetworkClient] [] [] [1B2M2Y8Asg] [WARN]: Error while fetching metadata 
with correlation id 28711 : {events-deadletter=LEADER_NOT_AVAILABLE}


  was:
Under heavy load, Kafka producer can stop publishing events. Logs below.

[2016-12-19T15:01:28.956Z] [sgs] [kafka-producer-network-thread | producer-3] 
[KafkaProducerClient] [] [] [1B2M2Y8Asg] [WARN]: Error sending message to 
Kafka
  org.apache.kafka.common.errors.TimeoutException: Expiring 46 record(s) for 
events-deadletter-0 due to 30032 ms has passed since batch creation plus linger 
time (#285 from 2016-12-19
T15:01:28.793Z)
[2016-12-19T15:01:28.956Z] [sgs] [kafka-producer-network-thread | producer-3] 
[SgsService] [] [] [1B2M2Y8Asg] [WARN]: Error writing signal to Kafka 
deadletter queue
  org.apache.kafka.common.errors.TimeoutException: Expiring 46 record(s) for 
events-deadletter-0 due to 30032 ms has passed since batch creation plus linger 
time (#286 from 2016-12-19
T15:01:28.793Z)
[2016-12-19T15:01:28.960Z] [sgs] [kafka-producer-network-thread | producer-3] 
[Sender] [] [] [1B2M2Y8Asg] [ERROR]: Uncaught error in kafka producer I/O 
thread:
java.util.ConcurrentModificationException: null
at java.util.ArrayDeque$DeqIterator.next(ArrayDeque.java:643) 
~[na:1.8.0_45]
at 
org.apache.kafka.clients.producer.internals.RecordAccumulator.abortExpiredBatches(RecordAccumulator.java:242)
 ~[kafka-clients-0.10.1.0.jar:na]
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:212) 
~[kafka-clients-0.10.1.0.jar:na]
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:135) 
~[kafka-clients-0.10.1.0.jar:na]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
[2016-12-19T15:01:28.981Z] [sgs] [kafka-producer-network-thread | producer-3] 
[NetworkClient] [] [] [1B2M2Y8Asg] [WARN]: Error while fetching metadata 
with correlation id 28711 : {events-deadletter=LEADER_NOT_AVAILABLE}



> ConcurrentModificationException in KafkaProducer event loop
> ---
>
> Key: KAFKA-4557
> URL: https://issues.apache.org/jira/browse/KAFKA-4557
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.10.1.0
>Reporter: Sergey Alaev
>  Labels: reliability
> Fix For: 0.10.2

[jira] [Updated] (KAFKA-4557) ConcurrentModificationException in KafkaProducer event loop

2016-12-19 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-4557:
---
Fix Version/s: 0.10.2.0

> ConcurrentModificationException in KafkaProducer event loop
> ---
>
> Key: KAFKA-4557
> URL: https://issues.apache.org/jira/browse/KAFKA-4557
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.10.1.0
>Reporter: Sergey Alaev
>  Labels: reliability
> Fix For: 0.10.2.0
>
>
> Under heavy load, Kafka producer can stop publishing events. Logs below.
> [2016-12-19T15:01:28.956Z] [sgs] [kafka-producer-network-thread | producer-3] 
> [KafkaProducerClient] [] [] [1B2M2Y8Asg] [WARN]: Error sending message 
> to Kafka
>   org.apache.kafka.common.errors.TimeoutException: Expiring 46 record(s) for 
> events-deadletter-0 due to 30032 ms has passed since batch creation plus 
> linger time (#285 from 2016-12-19
> T15:01:28.793Z)
> [2016-12-19T15:01:28.956Z] [sgs] [kafka-producer-network-thread | producer-3] 
> [SgsService] [] [] [1B2M2Y8Asg] [WARN]: Error writing signal to Kafka 
> deadletter queue
>   org.apache.kafka.common.errors.TimeoutException: Expiring 46 record(s) for 
> events-deadletter-0 due to 30032 ms has passed since batch creation plus 
> linger time (#286 from 2016-12-19
> T15:01:28.793Z)
> [2016-12-19T15:01:28.960Z] [sgs] [kafka-producer-network-thread | producer-3] 
> [Sender] [] [] [1B2M2Y8Asg] [ERROR]: Uncaught error in kafka producer 
> I/O thread:
> java.util.ConcurrentModificationException: null
> at java.util.ArrayDeque$DeqIterator.next(ArrayDeque.java:643) 
> ~[na:1.8.0_45]
> at 
> org.apache.kafka.clients.producer.internals.RecordAccumulator.abortExpiredBatches(RecordAccumulator.java:242)
>  ~[kafka-clients-0.10.1.0.jar:na]
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:212) 
> ~[kafka-clients-0.10.1.0.jar:na]
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:135) 
> ~[kafka-clients-0.10.1.0.jar:na]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
> [2016-12-19T15:01:28.981Z] [sgs] [kafka-producer-network-thread | producer-3] 
> [NetworkClient] [] [] [1B2M2Y8Asg] [WARN]: Error while fetching 
> metadata with correlation id 28711 : {events-deadletter=LEADER_NOT_AVAILABLE}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4557) ConcurrentModificationException in KafkaProducer event loop

2016-12-19 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15761434#comment-15761434
 ] 

Ismael Juma commented on KAFKA-4557:


Thanks for the report. Not clear how this is happening since we synchronize on 
the `Deque` when we access and mutate it. I had a quick look and could not find 
a case where that didn't apply. A more in-depth look is required, it seems.

> ConcurrentModificationException in KafkaProducer event loop
> ---
>
> Key: KAFKA-4557
> URL: https://issues.apache.org/jira/browse/KAFKA-4557
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.10.1.0
>Reporter: Sergey Alaev
>  Labels: reliability
> Fix For: 0.10.2.0
>
>
> Under heavy load, Kafka producer can stop publishing events. Logs below.
> [2016-12-19T15:01:28.779Z] [sgs] [kafka-producer-network-thread | producer-3] 
> [NetworkClient] [] [] [] [DEBUG]: Disconnecting from node 2 due to 
> request timeout.
> [2016-12-19T15:01:28.793Z] [sgs] [kafka-producer-network-thread | producer-3] 
> [KafkaProducerClient] [] [] [1B2M2Y8Asg] [WARN]: Error sending message 
> to Kafka
> org.apache.kafka.common.errors.NetworkException: The server disconnected 
> before a response was received.
> [2016-12-19T15:01:28.838Z] [sgs] [kafka-producer-network-thread | producer-3] 
> [KafkaProducerClient] [] [] [1B2M2Y8Asg] [WARN]: Error sending message 
> to Kafka
>   org.apache.kafka.common.errors.NetworkException: The server disconnected 
> before a response was received. (#2 from 2016-12-19T15:01:28.793Z)
> 
> [2016-12-19T15:01:28.956Z] [sgs] [kafka-producer-network-thread | producer-3] 
> [KafkaProducerClient] [] [] [1B2M2Y8Asg] [WARN]: Error sending message 
> to Kafka
>   org.apache.kafka.common.errors.TimeoutException: Expiring 46 record(s) for 
> events-deadletter-0 due to 30032 ms has passed since batch creation plus 
> linger time (#285 from 2016-12-19
> T15:01:28.793Z)
> [2016-12-19T15:01:28.956Z] [sgs] [kafka-producer-network-thread | producer-3] 
> [SgsService] [] [] [1B2M2Y8Asg] [WARN]: Error writing signal to Kafka 
> deadletter queue
>   org.apache.kafka.common.errors.TimeoutException: Expiring 46 record(s) for 
> events-deadletter-0 due to 30032 ms has passed since batch creation plus 
> linger time (#286 from 2016-12-19
> T15:01:28.793Z)
> [2016-12-19T15:01:28.960Z] [sgs] [kafka-producer-network-thread | producer-3] 
> [Sender] [] [] [1B2M2Y8Asg] [ERROR]: Uncaught error in kafka producer 
> I/O thread:
> java.util.ConcurrentModificationException: null
> at java.util.ArrayDeque$DeqIterator.next(ArrayDeque.java:643) 
> ~[na:1.8.0_45]
> at 
> org.apache.kafka.clients.producer.internals.RecordAccumulator.abortExpiredBatches(RecordAccumulator.java:242)
>  ~[kafka-clients-0.10.1.0.jar:na]
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:212) 
> ~[kafka-clients-0.10.1.0.jar:na]
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:135) 
> ~[kafka-clients-0.10.1.0.jar:na]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
> [2016-12-19T15:01:28.981Z] [sgs] [kafka-producer-network-thread | producer-3] 
> [NetworkClient] [] [] [1B2M2Y8Asg] [WARN]: Error while fetching 
> metadata with correlation id 28711 : {events-deadletter=LEADER_NOT_AVAILABLE}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Kafka controlled shutdown hangs when there are large number of topics in the cluster

2016-12-19 Thread Robin, Martin (Nokia - IN/Bangalore)
Hi

We have 9 broker instances in a kafka cluster spread across 3 linux machines. 
The 1st machine has 4 broker instances. 2nd  machine has 4 broker instances and 
3rd one has 1 broker instance.  There are around 101 topics created in the 
cluster

We start the broker as follows
All 4 brokers are started on first machine
All 4 brokers are started on 2nd machine
1 broker started on 3rd machine

After brokers were running for sometime, we try to shutdown the brokers as below
All 4 brokers stopped on 1st machine
4 brokers are stopped on 2nd machine While we do this kafka 
controlled shutdown hangs

This same issue was not seen with 25 topics.

Please let us know if any solution is known to this issue

Thanks
Martin






Re: [DISCUSS] KIP-102 - Add close with timeout for consumers

2016-12-19 Thread Rajini Sivaram
Thank you for the reviews.

@Becket @Ewen, Agree that making all blocking calls have a timeout will be
trickier and hence the scope of this KIP is limited to close().

@Jay Yes, this should definitely go into release notes, will make sure it
is added. I will add some integration tests with broker failures for
testing the timeout, but they cannot completely eliminate the risk of a
hang. Over time, hopefully system tests will help catch most issues.


On Sat, Dec 17, 2016 at 1:15 AM, Jay Kreps  wrote:

> I think this is great. Sounds like one implication is that existing code
> that called close() and hit the timeout would now hang indefinitely. We saw
> this kind of thing a lot in automated testing scenarios where people don't
> correctly sequence their shutdown of client and server. I think this is
> okay, but might be good to include in the release notes.
>
> -jay
>
> On Thu, Dec 15, 2016 at 5:32 AM, Rajini Sivaram 
> wrote:
>
> Hi all,
>
>
>
>
>
> I have just created KIP-102 to add a new close method for consumers with a
>
>
> timeout parameter, making Consumer consistent with Producer:
>
>
>
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 102+-+Add+close+with+timeout+for+consumers
>
>
>
>
>
> Comments and suggestions are welcome.
>
>
>
>
>
> Thank you...
>
>
>
>
>
> Regards,
>
>
>
>
>
> Rajini
>



-- 
Regards,

Rajini


Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional Messaging

2016-12-19 Thread radai
regarding efficiency:

I'd like to distinguish between server efficiency (resource utilization of
the broker machine alone) and overall network efficiency (resource
utilization on brokers, producers and consumers, including network traffic).
my proposal is not as resource-efficient on the broker (although it can be,
depends on a few trade offs and implementation details). HOWEVER, if i look
at the overall efficiency:

   1.clients would need to either buffer or double-read uncommitted msgs.
for N clients reading the stream M times (after re-starts and reconsumes)
this would mean a M*N factor in either network BW or disk/memory space
(depends on if buffer vs re-read). potentially N*M more broker-side reads
too.
   2 to reduce the broker side cost several things can be done (this is not
an either-or list, these are commulative):
  2.1 - keep TX logs in mem (+overflow to disk) - trades disk writes
for TX resiliency
  2.2 - when "appending" TX logs to real partitions - instead of
reading from (disk-based) TX log and writing to partition log (x2 disk
writes) the TX log can be made a segment file (so file rename, with
associated protocol changes). this would avoid double writing by simply
making the TX file part of the partition (for large enough TXs. smaller
ones can be rewritten).
  2.3 - the approach above could be combined with a background "defrag"
- similar in concept to compaction - to further reduce the total of
resulting number of files.

I think my main issue with the current proposal, more important than
performance, is lack of proper "encapsulation" of transactions - I dont
think downstream consumers should see uncommitted msgs. ever.


On Sun, Dec 18, 2016 at 10:19 PM, Becket Qin  wrote:

> @Jason
>
> Yes, second thought on the number of messages included, the offset delta
> will probably be sufficient. The use case I encounter before for number of
> messages in a message set is an embedded mirror maker on the destination
> broker side which fetches message directly from the source cluster. Ideally
> the destination cluster only needs to check CRC and assign the offsets
> because all the message verification has been done by the source cluster,
> but due to the lack of the number of messages in the message set, we have
> to decompress the message set to increment offsets correctly. By knowing
> the number of the messages in the message set, we can avoid doing that. The
> offset delta will also help. It's just then the offsets may have holes for
> log compacted topics, but that may be fine.
>
> @Apurva
>
> I am not sure if it is true that the consumer will either deliver all the
> message for the entire transaction or none of them from one poll() call. If
> we allow the transactions to be across partitions, unless the consumer
> consumes from all the partitions involved in a transactions, it seems
> impossible for it to deliver *all* the messages in a transaction, right? A
> weaker guarantee is we will deliver all or none of the messages that belong
> to the same transaction in ONE partition, but this would be different from
> the guarantee from the producer side.
>
> My two cents on Radai's sideways partition design:
> 1. If we consider the producer side behavior as doing a two phase commit
> which including the committing the consumer offsets, it is a little awkward
> that we allow uncommitted message goes into the main log and rely on the
> consumer to filter out. So semantic wise I think it would be better if we
> can avoid this. Radai's suggestion is actually intuitive because if the
> brokers do not want to expose uncommitted transactions to the consumer, the
> brokers have to buffer it.
>
> 2. Regarding the efficiency. I think may be it worth looking at the
> efficiency cost v.s benefit. The efficiency includes both server side
> efficiency and consumer side efficiency.
>
> Regarding the server side efficiency, the current proposal would probably
> have better efficiency regardless of whether something goes wrong. Radai's
> suggestion would put more burden on the server side. If nothing goes wrong
> we always pay the cost of having double copy of the transactional messages
> and do not get the semantic benefit. But if something goes wrong, the
> efficiency cost we pay we get us a better semantic.
>
> For the consumer side efficiency, because there is no need to buffer the
> uncommitted messages. The current proposal may have to potentially buffer
> uncommitted messages so it would be less efficient than Radai's suggestion
> when a transaction aborts. When everything goes well, both design seems
> having the similar performance. However, it depends on whether we are
> willing to loosen the consumer side transaction guarantee that I mentioned
> earlier to Apurva.
>
> Currently the biggest pressure on the consumer side is that it has to
> buffer incomplete transactions. There are two reasons for it,
> A. A transaction may be aborted so we cannot expose the messages to the
> users.
> B

Re: Kafka controlled shutdown hangs when there are large number of topics in the cluster

2016-12-19 Thread Gwen Shapira
Can you try setting auto.leader.rebalance.enable=false in your
configuration (for all brokers) and see if it solves this problem?
We've had some reports regarding this feature interfering with
controlled shutdown.

On Mon, Dec 19, 2016 at 5:02 AM, Robin, Martin (Nokia - IN/Bangalore)
 wrote:
> Hi
>
> We have 9 broker instances in a kafka cluster spread across 3 linux machines. 
> The 1st machine has 4 broker instances. 2nd  machine has 4 broker instances 
> and 3rd one has 1 broker instance.  There are around 101 topics created in 
> the cluster
>
> We start the broker as follows
> All 4 brokers are started on first machine
> All 4 brokers are started on 2nd machine
> 1 broker started on 3rd machine
>
> After brokers were running for sometime, we try to shutdown the brokers as 
> below
> All 4 brokers stopped on 1st machine
> 4 brokers are stopped on 2nd machine While we do this kafka 
> controlled shutdown hangs
>
> This same issue was not seen with 25 topics.
>
> Please let us know if any solution is known to this issue
>
> Thanks
> Martin
>
>
>
>



-- 
Gwen Shapira
Product Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter | blog


Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional Messaging

2016-12-19 Thread Jay Kreps
Hey Radai,

I'm not sure if I fully understand what you are proposing, but I
interpreted it to be similar to a proposal we worked through back at
LinkedIn. The proposal was to commit to a central txlog topic, and then
recopy to the destination topic upon transaction commit. The observation on
that approach at the time were the following:

   1. It is cleaner since the output topics have only committed data!
   2. You need full replication on the txlog topic to ensure atomicity. We
   weren't able to come up with a solution where you buffer in memory or use
   renaming tricks the way you are describing. The reason is that once you
   begin committing you must ensure that the commit eventually succeeds to
   guarantee atomicity. If you use a transient store you might commit some
   data and then have a server failure that causes you to lose the rest of the
   transaction.
   3. Having a single log allows the reader to choose a "read uncommitted"
   mode that hands out messages immediately. This is important for cases where
   latency is important, especially for stream processing topologies where
   these latencies stack up across multiple stages.

For the stream processing use case, item (2) is a bit of a deal killer.
This takes the cost of a transient message write (say the intermediate
result of a stream processing topology) from 3x writes (assuming 3x
replication) to 6x writes. This means you basically can't default it on. If
we can in fact get the cost down to a single buffered write (i.e. 1x the
data is written to memory and buffered to disk if the transaction is large)
as in the KIP-98 proposal without too many other negative side effects I
think that could be compelling.

-Jay



On Mon, Dec 19, 2016 at 9:36 AM, radai  wrote:

> regarding efficiency:
>
> I'd like to distinguish between server efficiency (resource utilization of
> the broker machine alone) and overall network efficiency (resource
> utilization on brokers, producers and consumers, including network
> traffic).
> my proposal is not as resource-efficient on the broker (although it can be,
> depends on a few trade offs and implementation details). HOWEVER, if i look
> at the overall efficiency:
>
>1.clients would need to either buffer or double-read uncommitted msgs.
> for N clients reading the stream M times (after re-starts and reconsumes)
> this would mean a M*N factor in either network BW or disk/memory space
> (depends on if buffer vs re-read). potentially N*M more broker-side reads
> too.
>2 to reduce the broker side cost several things can be done (this is not
> an either-or list, these are commulative):
>   2.1 - keep TX logs in mem (+overflow to disk) - trades disk writes
> for TX resiliency
>   2.2 - when "appending" TX logs to real partitions - instead of
> reading from (disk-based) TX log and writing to partition log (x2 disk
> writes) the TX log can be made a segment file (so file rename, with
> associated protocol changes). this would avoid double writing by simply
> making the TX file part of the partition (for large enough TXs. smaller
> ones can be rewritten).
>   2.3 - the approach above could be combined with a background "defrag"
> - similar in concept to compaction - to further reduce the total of
> resulting number of files.
>
> I think my main issue with the current proposal, more important than
> performance, is lack of proper "encapsulation" of transactions - I dont
> think downstream consumers should see uncommitted msgs. ever.
>
>
> On Sun, Dec 18, 2016 at 10:19 PM, Becket Qin  wrote:
>
> > @Jason
> >
> > Yes, second thought on the number of messages included, the offset delta
> > will probably be sufficient. The use case I encounter before for number
> of
> > messages in a message set is an embedded mirror maker on the destination
> > broker side which fetches message directly from the source cluster.
> Ideally
> > the destination cluster only needs to check CRC and assign the offsets
> > because all the message verification has been done by the source cluster,
> > but due to the lack of the number of messages in the message set, we have
> > to decompress the message set to increment offsets correctly. By knowing
> > the number of the messages in the message set, we can avoid doing that.
> The
> > offset delta will also help. It's just then the offsets may have holes
> for
> > log compacted topics, but that may be fine.
> >
> > @Apurva
> >
> > I am not sure if it is true that the consumer will either deliver all the
> > message for the entire transaction or none of them from one poll() call.
> If
> > we allow the transactions to be across partitions, unless the consumer
> > consumes from all the partitions involved in a transactions, it seems
> > impossible for it to deliver *all* the messages in a transaction, right?
> A
> > weaker guarantee is we will deliver all or none of the messages that
> belong
> > to the same transaction in ONE partition, but this would be different
> from
> > the g

[jira] [Updated] (KAFKA-4534) StreamPartitionAssignor only ever updates the partitionsByHostState and metadataWithInternalTopics once.

2016-12-19 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-4534:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Issue resolved by pull request 2256
[https://github.com/apache/kafka/pull/2256]

> StreamPartitionAssignor only ever updates the partitionsByHostState and 
> metadataWithInternalTopics once.
> 
>
> Key: KAFKA-4534
> URL: https://issues.apache.org/jira/browse/KAFKA-4534
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.0
>Reporter: Damian Guy
>Assignee: Damian Guy
> Fix For: 0.10.2.0
>
>
> StreamPartitionAssignor only ever updates the partitionsByHostState and 
> metadataWithInternalTopics once. This results in incorrect metadata on 
> rebalances.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #2256: KAFKA-4534: StreamPartitionAssignor only ever upda...

2016-12-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2256


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-4534) StreamPartitionAssignor only ever updates the partitionsByHostState and metadataWithInternalTopics once.

2016-12-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15762205#comment-15762205
 ] 

ASF GitHub Bot commented on KAFKA-4534:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2256


> StreamPartitionAssignor only ever updates the partitionsByHostState and 
> metadataWithInternalTopics once.
> 
>
> Key: KAFKA-4534
> URL: https://issues.apache.org/jira/browse/KAFKA-4534
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.0
>Reporter: Damian Guy
>Assignee: Damian Guy
> Fix For: 0.10.2.0
>
>
> StreamPartitionAssignor only ever updates the partitionsByHostState and 
> metadataWithInternalTopics once. This results in incorrect metadata on 
> rebalances.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional Messaging

2016-12-19 Thread Sriram Subramanian
Radai,

I think it is important to understand the key requirements that we don’t
want to compromise. We can then understand the tradeoffs of the different
approaches. We did in fact start with the double journal approach couple of
years back. I will highlight the must have requirements first and then
explain the trade offs based on my understanding.

1. End to end latency for stream processing - This is probably one of the
biggest reasons to support transactions in Kafka. We would like to support
very low latency for end to end processing across steam topologies. This
means you would want your downstream processors to see the output of your
processing immediately. The low latency is a requirement even if we only
expose committed messages.

2. Speculative execution - We would like to go one step further for stream
processing. 99% of the transactions will always succeed. We would like to
take advantage of this and process the messages optimistically even if the
transactions are still unfinished. If the transactions abort, we would do a
cascading abort across the topology. This helps us to complete all the
processing and keep the output ready and expose them once the transactions
are committed. This will help us to significantly bring down the latency
for end to end stream processing and provide the ability to keep exactly
once as the default setting.

3. IO and memory constraints - We would want a solution that takes 2x the
number of writes. This will bring down broker utilization by half. I don’t
really understand the in memory solution (would be useful if you can
explain it more if you think it solves these goals) but the same resource
constraints apply. What has made Kafka successful is the ability to run
very high throughput clusters with very few machines. We would like to keep
this true when a cluster is largely dominated by stream processing
workloads.

4. Provide both read committed and read uncommitted isolation levels - This
is actually a desired feature. This is similar to database isolation levels
(except that we provide only two of them for now). Downstream systems that
need strong guarantees with some performance impact can choose read
committed isolation level. Systems that want to optimize for performance
and can live with approximations would choose read uncommitted options.
This helps to nicely decouple downstream users that would like to share
topics but have different end goals.

There are other obvious goals like correctness of the protocol and
simplicity of the design that needs to be true by default.

Given these goals, the double journal approach was a non starter to enable
low end to end latency and did not provide the ability to do speculative
execution in the future. We also found the resource constraints
(specifically IO/Network) to be unacceptable.

We did understand the complexity of the consumers but it was the best
tradeoff considering the other must have goals. We also thought of another
approach to push the consumer buffering to the broker side. This would
enable multiple consumer groups to share the same buffer pool for a
specific topic partition. However, in the worst case, you would need to
bring the entire log into memory to remove the aborted transaction (for a
consumer that is catching up from time 0). This would also make us loose
zero copy semantics.

I would be excited to hear an option that can solve our must have goals and
still keep the consumer really thin. The abstraction seems fine since we
allow the end users to pick the guarantees they need.

On Mon, Dec 19, 2016 at 10:56 AM, Jay Kreps  wrote:

> Hey Radai,
>
> I'm not sure if I fully understand what you are proposing, but I
> interpreted it to be similar to a proposal we worked through back at
> LinkedIn. The proposal was to commit to a central txlog topic, and then
> recopy to the destination topic upon transaction commit. The observation on
> that approach at the time were the following:
>
>1. It is cleaner since the output topics have only committed data!
>2. You need full replication on the txlog topic to ensure atomicity. We
>weren't able to come up with a solution where you buffer in memory or
> use
>renaming tricks the way you are describing. The reason is that once you
>begin committing you must ensure that the commit eventually succeeds to
>guarantee atomicity. If you use a transient store you might commit some
>data and then have a server failure that causes you to lose the rest of
> the
>transaction.
>3. Having a single log allows the reader to choose a "read uncommitted"
>mode that hands out messages immediately. This is important for cases
> where
>latency is important, especially for stream processing topologies where
>these latencies stack up across multiple stages.
>
> For the stream processing use case, item (2) is a bit of a deal killer.
> This takes the cost of a transient message write (say the intermediate
> result of a stream processing t

Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional Messaging

2016-12-19 Thread Sriram Subramanian
small correction in my third point -

3. IO and memory constraints - We would want a solution that *does not take*
2x the number of writes.

On Mon, Dec 19, 2016 at 12:37 PM, Sriram Subramanian 
wrote:

> Radai,
>
> I think it is important to understand the key requirements that we don’t
> want to compromise. We can then understand the tradeoffs of the different
> approaches. We did in fact start with the double journal approach couple of
> years back. I will highlight the must have requirements first and then
> explain the trade offs based on my understanding.
>
> 1. End to end latency for stream processing - This is probably one of the
> biggest reasons to support transactions in Kafka. We would like to support
> very low latency for end to end processing across steam topologies. This
> means you would want your downstream processors to see the output of your
> processing immediately. The low latency is a requirement even if we only
> expose committed messages.
>
> 2. Speculative execution - We would like to go one step further for stream
> processing. 99% of the transactions will always succeed. We would like to
> take advantage of this and process the messages optimistically even if the
> transactions are still unfinished. If the transactions abort, we would do a
> cascading abort across the topology. This helps us to complete all the
> processing and keep the output ready and expose them once the transactions
> are committed. This will help us to significantly bring down the latency
> for end to end stream processing and provide the ability to keep exactly
> once as the default setting.
>
> 3. IO and memory constraints - We would want a solution that takes 2x the
> number of writes. This will bring down broker utilization by half. I don’t
> really understand the in memory solution (would be useful if you can
> explain it more if you think it solves these goals) but the same resource
> constraints apply. What has made Kafka successful is the ability to run
> very high throughput clusters with very few machines. We would like to keep
> this true when a cluster is largely dominated by stream processing
> workloads.
>
> 4. Provide both read committed and read uncommitted isolation levels -
> This is actually a desired feature. This is similar to database isolation
> levels (except that we provide only two of them for now). Downstream
> systems that need strong guarantees with some performance impact can choose
> read committed isolation level. Systems that want to optimize for
> performance and can live with approximations would choose read uncommitted
> options. This helps to nicely decouple downstream users that would like to
> share topics but have different end goals.
>
> There are other obvious goals like correctness of the protocol and
> simplicity of the design that needs to be true by default.
>
> Given these goals, the double journal approach was a non starter to enable
> low end to end latency and did not provide the ability to do speculative
> execution in the future. We also found the resource constraints
> (specifically IO/Network) to be unacceptable.
>
> We did understand the complexity of the consumers but it was the best
> tradeoff considering the other must have goals. We also thought of another
> approach to push the consumer buffering to the broker side. This would
> enable multiple consumer groups to share the same buffer pool for a
> specific topic partition. However, in the worst case, you would need to
> bring the entire log into memory to remove the aborted transaction (for a
> consumer that is catching up from time 0). This would also make us loose
> zero copy semantics.
>
> I would be excited to hear an option that can solve our must have goals
> and still keep the consumer really thin. The abstraction seems fine since
> we allow the end users to pick the guarantees they need.
>
> On Mon, Dec 19, 2016 at 10:56 AM, Jay Kreps  wrote:
>
>> Hey Radai,
>>
>> I'm not sure if I fully understand what you are proposing, but I
>> interpreted it to be similar to a proposal we worked through back at
>> LinkedIn. The proposal was to commit to a central txlog topic, and then
>> recopy to the destination topic upon transaction commit. The observation
>> on
>> that approach at the time were the following:
>>
>>1. It is cleaner since the output topics have only committed data!
>>2. You need full replication on the txlog topic to ensure atomicity. We
>>weren't able to come up with a solution where you buffer in memory or
>> use
>>renaming tricks the way you are describing. The reason is that once you
>>begin committing you must ensure that the commit eventually succeeds to
>>guarantee atomicity. If you use a transient store you might commit some
>>data and then have a server failure that causes you to lose the rest
>> of the
>>transaction.
>>3. Having a single log allows the reader to choose a "read uncommitted"
>>mode that hands out messages immediately. Th

[GitHub] kafka pull request #2272: KAFKA-4553: Improve round robin assignment in Conn...

2016-12-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2272


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-4553) Connect's round robin assignment produces undesirable distribution of connectors/tasks

2016-12-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15762247#comment-15762247
 ] 

ASF GitHub Bot commented on KAFKA-4553:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2272


> Connect's round robin assignment produces undesirable distribution of 
> connectors/tasks
> --
>
> Key: KAFKA-4553
> URL: https://issues.apache.org/jira/browse/KAFKA-4553
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.10.1.0
>Reporter: Ewen Cheslack-Postava
>Assignee: Ewen Cheslack-Postava
> Fix For: 0.10.2.0
>
>
> Currently the round robin assignment in Connect looks something like this:
> foreach connector {
>   assign connector to next worker
>   for each task in connector {
> assign task to next member
>   }
> }
> For the most part we assume that connectors and tasks are effectively 
> equivalent units of work, but this is actually rarely the case. Connectors 
> are usually much lighterweight as they are just monitoring for changes in the 
> source/sink system and tasks are doing the heavy lifting. The way we are 
> currently doing round robin assignment then causes uneven distributions of 
> work in some cases that are not too uncommon.
> In particular, it gets bad if there are an even number of workers and 
> connectors that generate only a single task since this results in the even 
> #'d workers always getting assigned connectors and odd workers always getting 
> assigned tasks. An extreme case of this is when users start distributed mode 
> clusters with just a couple of workers to get started and deploy multiple 
> single-task connectors (e.g. CDC connectors like Debezium would be a common 
> example). All the connectors end up on one worker, all the tasks end up on 
> the other, and the second worker becomes overloaded.
> Although the ideal solution to this problem is to have a better idea of how 
> much load each connector/task will generate, I don't think we want to get 
> into the business of full-on cluster resource management. An alternative 
> which I think avoids this common pitfall without the risk of hitting another 
> common bad case is to change the algorithm to assign all the connectors 
> first, then all the tasks, i.e.
> foreach connector {
>   assign connector to next worker
> }
> foreach connector {
>   for each task in connector {
> assign task to next worker
>   }
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4553) Connect's round robin assignment produces undesirable distribution of connectors/tasks

2016-12-19 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-4553:
---
   Resolution: Fixed
Fix Version/s: 0.10.2.0
   Status: Resolved  (was: Patch Available)

Issue resolved by pull request 2272
[https://github.com/apache/kafka/pull/2272]

> Connect's round robin assignment produces undesirable distribution of 
> connectors/tasks
> --
>
> Key: KAFKA-4553
> URL: https://issues.apache.org/jira/browse/KAFKA-4553
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.10.1.0
>Reporter: Ewen Cheslack-Postava
>Assignee: Ewen Cheslack-Postava
> Fix For: 0.10.2.0
>
>
> Currently the round robin assignment in Connect looks something like this:
> foreach connector {
>   assign connector to next worker
>   for each task in connector {
> assign task to next member
>   }
> }
> For the most part we assume that connectors and tasks are effectively 
> equivalent units of work, but this is actually rarely the case. Connectors 
> are usually much lighterweight as they are just monitoring for changes in the 
> source/sink system and tasks are doing the heavy lifting. The way we are 
> currently doing round robin assignment then causes uneven distributions of 
> work in some cases that are not too uncommon.
> In particular, it gets bad if there are an even number of workers and 
> connectors that generate only a single task since this results in the even 
> #'d workers always getting assigned connectors and odd workers always getting 
> assigned tasks. An extreme case of this is when users start distributed mode 
> clusters with just a couple of workers to get started and deploy multiple 
> single-task connectors (e.g. CDC connectors like Debezium would be a common 
> example). All the connectors end up on one worker, all the tasks end up on 
> the other, and the second worker becomes overloaded.
> Although the ideal solution to this problem is to have a better idea of how 
> much load each connector/task will generate, I don't think we want to get 
> into the business of full-on cluster resource management. An alternative 
> which I think avoids this common pitfall without the risk of hitting another 
> common bad case is to change the algorithm to assign all the connectors 
> first, then all the tasks, i.e.
> foreach connector {
>   assign connector to next worker
> }
> foreach connector {
>   for each task in connector {
> assign task to next worker
>   }
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [VOTE] KIP-87 - Add Compaction Tombstone Flag

2016-12-19 Thread Jay Kreps
Hey Michael,

Here is the compatibility concern I have:

   1. You have a consumer app that relies on value == null to indicate a
   delete (current semantics).
   2. You upgrade Kafka and your clients.
   3. Some producer starts using the tombstone field in combination with
   non-null.

I share Ismael's dislike of setting tombstones on records with null values.
This makes sense as a transitional state, but as an end state its a bit
weird. You'd expect to be able to mix null values and tombstones, and have
the null values remain and the tombstones get compacted. However what will
happen is both will be compacted and upon debugging this you'll learn that
we sometimes use null in the value to indicate tombstone. Ismael's solution
is a bigger compatibility break, though, so not sure if that is better.

My other question is the relationship to KIP-82. My read is that this KIP
solves some but not all of the problems KIP-82 is intended for. KIP-82, on
the other hand, seems to address most of the motivating uses for this KIP.
The exception is maybe item (5) on the list where you want to simultaneous
delete and convey some information to subscribers, but I couldn't construct
a concrete examples for that one. Do we need to rationalize these two KIPs?
That is, do you still advocate this proposal if we do KIP-82 and vice
versa? As you may have noticed I'm somewhat emotionally invested in the
simplicity of the core data model, so my default position is let's try to
avoid stuffing more stuff in, but if we have to add stuff I like each of
these individually more than doing both. :-)

-Jay




On Fri, Dec 16, 2016 at 12:16 PM, Michael Pearce 
wrote:

> Hi Jay
>
> I disagree here that we are breaking any compatibility, we went through
> this on the discussion thread in fact with the help of that thread is how
> the kip came to the solution.
>
> Also on the supported combinations front you mention, we are not taking
> anything away afaik.
>
> Currently supported are only are:
> Null value = delete
> Non-null value = non delete
>
> With this kip we would support
> Null value + tombstone = delete
> Non null value + tombstone = delete
> Non null value + no tombstone = non delete
>
> As for the alternative idea, this is simply a new policy, how can there be
> confusion here? For this policy it would be explicit that tombstone marker
> would need to be set for a delete.
>
> I'm going to vent a little now as starting to get quite frustrated.
>
> We are going round in circles on kip-82 as per use cases there is now many
> use cases, how many more are needed? just because confluent don't see these
> doesn't mean they aren't real use cases other have, this is the point of
> the Apache foundation, it shouldn't be the view of just one organisation.
> It really is getting a feeling of the NIH syndrome. Rather than it being
> constructive on discussion of the implementation detail.
>
> kip-87 spawned from as on the kip call we all agreed this was needed. And
> would at least allow a custom wrapper be supported in a compacted topic,
> allowing meta data. Which again now I feel we are spinning wheels, and
> simply finding reasons not support it.
>
> Cheers
> Mike
>
>
>
> Sent using OWA for iPad
> 
> From: Jay Kreps 
> Sent: Friday, December 16, 2016 7:09:23 PM
> To: dev@kafka.apache.org
> Subject: Re: [VOTE] KIP-87 - Add Compaction Tombstone Flag
>
> Hey Michael,
>
> I do think it might have been better had we started with a separate concept
> of null vs delete. Given that we are where we are, I'm not sure that the
> possible cures we explored so far are better than the disease.
>
> I apologize for coming to this late, but I didn't really understand the
> implications of the KIP and that we'd be breaking compatibility with
> existing apps until the vote had begun, and in my defense I'm not sure the
> other folks voting did either.
>
> I think we all agree there are many existing apps that are built with the
> assumption of "null value non-tombstone" and it isn't possible to
> disambiguate these from tombstones on the producer. It isn't that anyone is
> saying we have to support all four possibilities at once, it's that we
> simply can't orphan one of the existing combinations or our users will eat
> us!
>
> If I've understood your alternate solution of adding another setting for
> compaction, I think this does fix the compatibility problem, but it adds an
> odd mode the user has to add on all their topics. While the current state
> is easily explainable, the resulting state where the meaning of tombstone
> and null are overlapping and ambiguous and dependent on a compaction
> setting that could change out of band or not be in sync with your code in
> some environment seems worse to me then where we currently are. I think the
> question is how would this combination be explained to users and does it
> make sense?
>
> -Jay
>
>
>
> On Fri, Dec 16, 2016 at 9:25 AM, Michael Pearce 
> wrote:
>
> > Hi Ch

Re: [VOTE] KIP-87 - Add Compaction Tombstone Flag

2016-12-19 Thread Michael Pearce
Hi Jay,

Agreed this stemmed as offshoot from KIP-82.

Which our main driver for was to be able to have some headers for a null value 
as such for our routing, audit, tracing and a few other bits which currently we 
are forced to do with a message wrapper, if we all agreed on KIP-82 that we 
need native headers and look to implement that the push for this would 
dissipate.

This KIP would allow for though one use case that comes to mind we could see 
which is to have business data with a delete. Though as said this isn't 
something we are pushing for think really we would have.

As such in summary yes, if you want to fully support KIP-82 and we can get that 
agreed in principle and a target release version, I think quite a few guys at 
LinkedIn are quite pro it too ;) I'm happy to drop this one.

Cheers
Mike

Sent using OWA for iPhone

From: Jay Kreps 
Sent: Monday, December 19, 2016 8:51:23 PM
To: dev@kafka.apache.org
Subject: Re: [VOTE] KIP-87 - Add Compaction Tombstone Flag

Hey Michael,

Here is the compatibility concern I have:

   1. You have a consumer app that relies on value == null to indicate a
   delete (current semantics).
   2. You upgrade Kafka and your clients.
   3. Some producer starts using the tombstone field in combination with
   non-null.

I share Ismael's dislike of setting tombstones on records with null values.
This makes sense as a transitional state, but as an end state its a bit
weird. You'd expect to be able to mix null values and tombstones, and have
the null values remain and the tombstones get compacted. However what will
happen is both will be compacted and upon debugging this you'll learn that
we sometimes use null in the value to indicate tombstone. Ismael's solution
is a bigger compatibility break, though, so not sure if that is better.

My other question is the relationship to KIP-82. My read is that this KIP
solves some but not all of the problems KIP-82 is intended for. KIP-82, on
the other hand, seems to address most of the motivating uses for this KIP.
The exception is maybe item (5) on the list where you want to simultaneous
delete and convey some information to subscribers, but I couldn't construct
a concrete examples for that one. Do we need to rationalize these two KIPs?
That is, do you still advocate this proposal if we do KIP-82 and vice
versa? As you may have noticed I'm somewhat emotionally invested in the
simplicity of the core data model, so my default position is let's try to
avoid stuffing more stuff in, but if we have to add stuff I like each of
these individually more than doing both. :-)

-Jay




On Fri, Dec 16, 2016 at 12:16 PM, Michael Pearce 
wrote:

> Hi Jay
>
> I disagree here that we are breaking any compatibility, we went through
> this on the discussion thread in fact with the help of that thread is how
> the kip came to the solution.
>
> Also on the supported combinations front you mention, we are not taking
> anything away afaik.
>
> Currently supported are only are:
> Null value = delete
> Non-null value = non delete
>
> With this kip we would support
> Null value + tombstone = delete
> Non null value + tombstone = delete
> Non null value + no tombstone = non delete
>
> As for the alternative idea, this is simply a new policy, how can there be
> confusion here? For this policy it would be explicit that tombstone marker
> would need to be set for a delete.
>
> I'm going to vent a little now as starting to get quite frustrated.
>
> We are going round in circles on kip-82 as per use cases there is now many
> use cases, how many more are needed? just because confluent don't see these
> doesn't mean they aren't real use cases other have, this is the point of
> the Apache foundation, it shouldn't be the view of just one organisation.
> It really is getting a feeling of the NIH syndrome. Rather than it being
> constructive on discussion of the implementation detail.
>
> kip-87 spawned from as on the kip call we all agreed this was needed. And
> would at least allow a custom wrapper be supported in a compacted topic,
> allowing meta data. Which again now I feel we are spinning wheels, and
> simply finding reasons not support it.
>
> Cheers
> Mike
>
>
>
> Sent using OWA for iPad
> 
> From: Jay Kreps 
> Sent: Friday, December 16, 2016 7:09:23 PM
> To: dev@kafka.apache.org
> Subject: Re: [VOTE] KIP-87 - Add Compaction Tombstone Flag
>
> Hey Michael,
>
> I do think it might have been better had we started with a separate concept
> of null vs delete. Given that we are where we are, I'm not sure that the
> possible cures we explored so far are better than the disease.
>
> I apologize for coming to this late, but I didn't really understand the
> implications of the KIP and that we'd be breaking compatibility with
> existing apps until the vote had begun, and in my defense I'm not sure the
> other folks voting did either.
>
> I think we all agree there are many existing apps 

Re: [DISCUSS] KIP 88: OffsetFetch Protocol Update

2016-12-19 Thread Vahid S Hashemian
Happy Monday,

Jason, thanks for further explaining the issue.

I have updated the KIP and reflected the recent discussions in there: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-88%3A+OffsetFetch+Protocol+Update
You can also see the modifications to the KIP compared to the approved 
version here: 
https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=66849788&selectedPageVersions=26&selectedPageVersions=25

Feedback and comments are welcome.

Thanks.
--Vahid



From:   Jason Gustafson 
To: dev@kafka.apache.org
Date:   12/16/2016 11:11 AM
Subject:Re: [DISCUSS] KIP 88: OffsetFetch Protocol Update



Thanks Vahid. To clarify the impact of this issue, since we have no way to
send an error code in the OffsetFetchResponse when requesting all offsets,
we cannot detect when the coordinator has moved to another broker or when
it is still in the process of loading the offsets. This means we cannot
tell if there were was an error or if there were just no offsets stored 
for
the group. We've considered a few options:

1. Include an error code at the top level of the response. This seems like
the cleanest approach. The downside is that clients need to look for 
errors
in two locations for response errors. One small benefit is that many
OffsetFetch errors are group-level, so in that case, we can save the need
to return responses for all the requested partitions.
2. Sort of hacky, but we could insert a "dummy" partition into the 
response
so that we have somewhere to return an error code.
3. Include no error code, but use a null array in the response to indicate
that there was some error. If there was no error, and the group simply had
no partitions, then we return an empty array. I guess in this case, if the
client receives a null array in the response, it should assume the worst
and rediscover the coordinator and try again.

My preference is the first one. Not sure if there are any other ideas?

-Jason

On Thu, Dec 15, 2016 at 3:02 PM, Vahid S Hashemian <
vahidhashem...@us.ibm.com> wrote:

> Hi all,
>
> Even though KIP-88 was recently approved, due to a limitation that comes
> with the proposed protocol change in KIP-88 I'll have to re-open it to
> address the problem.
> I'd like to thank Jason Gustafson for catching this issue.
>
> I'll explain this in the KIP as well, but to summarize, KIP-88 suggests
> adding the option of passing a "null" array in FetchOffset request to
> query all existing offsets for a consumer group. It does not suggest any
> modification to FetchOffset response.
>
> In the existing protocol, group or coordinator related errors are 
reported
> along with each partition in the OffsetFetch response.
>
> If there are partitions in the request, they are guaranteed to appear in
> the response (there could be an error code associated with each). So if
> there is an error, it is reported back by being attached to some 
partition
> in the request.
> If an empty array is passed, no error is reported (no matter what the
> group or coordinator status is). The response comes back with an empty
> list.
>
> With the proposed change in KIP-88 we could have a scenario in which a
> null array is sent in FetchOffset request, and due to some errors (for
> example if coordinator just started and hasn't caught up yet with the
> offset topic), an empty list is returned in the FetchOffset response 
(the
> group may or may not actually be empty). The issue is in situations like
> this no error can be returned in the response because there is no
> partition to attach the error to.
>
> I'll update the KIP with more details and propose to add to OffsetFetch
> response schema an "error_code" at the top level that can be used to
> report group related errors (instead of reporting those errors with each
> individual partition).
>
> I apologize if this causes any inconvenience.
>
> Feedback and comments are always welcome.
>
> Thanks.
> --Vahid
>
>






[jira] [Commented] (KAFKA-4526) Transient failure in ThrottlingTest.test_throttled_reassignment

2016-12-19 Thread Apurva Mehta (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15762361#comment-15762361
 ] 

Apurva Mehta commented on KAFKA-4526:
-

I had a look at the logs from one of the failures, and here is the problem: 

# The test has two phases: one bulk producer phase, which seeds the topic with 
large enough quantities of data so that we can actually test throttled 
reassignment. The other phase is the regular produce-consume-validate loop. 
# We start the reassignment, and then run the produce-consume-validate loop to 
ensure that no new messages are lost during reassignment.
# Because the produce-consume-validate pattern uses structured (integer) data 
in phase two, we require that the consumer start from the end of the log and 
also start before the producer begins producing messages. If this is true, then 
the consumer will read and validate all the messages sent by the producer. The 
test has a `wait_until` block, but that only checks for the existence of the 
process. 
# What is seen in the logs is that the producer starts and begins producing 
messages _before_ the consumer fetches the metadata for all the partitions. As 
as a result, the consumer misses the initial messages, which is consistent 
across all test failures. 
# This can be explained by the recent changes in ducktape: thanks to paramiko, 
running commands on worker machines is much faster since ssh connections are 
reused. Hence, the producer starts much faster than before, causing the initial 
set of messages to be missed by the consumer some of the time.
# The fix is to avoid using the PID of the consumer as a proxy for 'the 
consumer is ready'. Something  like 'partitions assigned' would be a more 
reliable proxy of the consumer being ready. Note that the original PR of the 
test had a timeout between consumer and producer start since there was no more 
robust method to ensure that the consumer was init'd before the producer 
started. But since the use of timeouts are --rightly!-- discouraged, it was 
removed. Adding suitable metrics would be a step in the right direction. 
# Next step is to leverage suitable metrics (like partitions assigned if it 
exists), or add them to the console consumer to ensure that it is init'd before 
continuing to start the producer.

> Transient failure in ThrottlingTest.test_throttled_reassignment
> ---
>
> Key: KAFKA-4526
> URL: https://issues.apache.org/jira/browse/KAFKA-4526
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ewen Cheslack-Postava
>Assignee: Apurva Mehta
>  Labels: system-test-failure, system-tests
> Fix For: 0.10.2.0
>
>
> This test is seeing transient failures sometimes
> {quote}
> Module: kafkatest.tests.core.throttling_test
> Class:  ThrottlingTest
> Method: test_throttled_reassignment
> Arguments:
> {
>   "bounce_brokers": false
> }
> {quote}
> This happens with both bounce_brokers = true and false. Fails with
> {quote}
> AssertionError: 1646 acked message did not make it to the Consumer. They are: 
> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19...plus 
> 1626 more. Total Acked: 174799, Total Consumed: 173153. We validated that the 
> first 1000 of these missing messages correctly made it into Kafka's data 
> files. This suggests they were lost on their way to the consumer.
> {quote}
> See 
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-12--001.1481535295--apache--trunk--62e043a/report.html
>  for an example.
> Note that there are a number of similar bug reports for different tests: 
> https://issues.apache.org/jira/issues/?jql=text%20~%20%22acked%20message%20did%20not%20make%20it%20to%20the%20Consumer%22%20and%20project%20%3D%20Kafka
>  I am wondering if we have a wrong ack setting somewhere that we should be 
> specifying as acks=all but is only defaulting to 0?
> It also seems interesting that the missing messages in these recent failures 
> seem to always start at 0...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #2276: MINOR: Add more exception information in Processor...

2016-12-19 Thread guozhangwang
GitHub user guozhangwang opened a pull request:

https://github.com/apache/kafka/pull/2276

MINOR: Add more exception information in ProcessorStateManager



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/guozhangwang/kafka KMinor-exception-message

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2276.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2276


commit a9601d82693b9526c7d52250d0e588c90d9b6a59
Author: Guozhang Wang 
Date:   2016-12-19T21:30:12Z

Add more exception information in ProcessorStateManager




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-4527) Transient failure of ConnectDistributedTest.test_pause_and_resume_sink where paused connector produces messages

2016-12-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15762372#comment-15762372
 ] 

ASF GitHub Bot commented on KAFKA-4527:
---

GitHub user shikhar opened a pull request:

https://github.com/apache/kafka/pull/2277

KAFKA-4527: task status was being updated before actual pause/resume

h/t @ewencp for pointing out the issue

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shikhar/kafka kafka-4527

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2277.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2277


commit 9911f25e2b06a02c56deadf7c586b5c263a08027
Author: Shikhar Bhushan 
Date:   2016-12-19T21:32:26Z

KAFKA-4527: task status was being updated before actual pause/resume

h/t @ewencp for pointing out the issue




> Transient failure of ConnectDistributedTest.test_pause_and_resume_sink where 
> paused connector produces messages
> ---
>
> Key: KAFKA-4527
> URL: https://issues.apache.org/jira/browse/KAFKA-4527
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect, system tests
>Reporter: Ewen Cheslack-Postava
>Assignee: Shikhar Bhushan
>  Labels: system-test-failure, system-tests
> Fix For: 0.10.2.0
>
>
> {quote}
> 
> test_id:
> kafkatest.tests.connect.connect_distributed_test.ConnectDistributedTest.test_pause_and_resume_sink
> status: FAIL
> run time:   40.164 seconds
> Paused sink connector should not consume any messages
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 123, in run
> data = self.run_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 176, in run_test
> return self.test_context.function(self.test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/connect/connect_distributed_test.py",
>  line 257, in test_pause_and_resume_sink
> assert num_messages == len(self.sink.received_messages()), "Paused sink 
> connector should not consume any messages"
> AssertionError: Paused sink connector should not consume any messages
> {quote}
> See one case here: 
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-12--001.1481535295--apache--trunk--62e043a/report.html
>  but it has also happened before, e.g. 
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-06--001.1481017508--apache--trunk--34aa538/report.html
> Thinking about the test, one simple possibility is that our approach to get 
> the number of messages produced/consumed during the test is flawed -- I think 
> we may not account for additional buffering between the connectors and the 
> process reading their output to determine what they have produced. However, 
> that's just a theory -- the minimal checking on the logs that I did didn't 
> reveal anything obviously wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #2277: KAFKA-4527: task status was being updated before a...

2016-12-19 Thread shikhar
GitHub user shikhar opened a pull request:

https://github.com/apache/kafka/pull/2277

KAFKA-4527: task status was being updated before actual pause/resume

h/t @ewencp for pointing out the issue

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shikhar/kafka kafka-4527

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2277.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2277


commit 9911f25e2b06a02c56deadf7c586b5c263a08027
Author: Shikhar Bhushan 
Date:   2016-12-19T21:32:26Z

KAFKA-4527: task status was being updated before actual pause/resume

h/t @ewencp for pointing out the issue




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] KIP-95: Incremental Batch Processing for Kafka Streams

2016-12-19 Thread Guozhang Wang
Avi,

I have granted you the permissions under apache id (avi).

Guozhang

On Thu, Dec 15, 2016 at 3:40 PM, Matthias J. Sax 
wrote:

> What is you wiki ID? We can grant you permission.
>
> -Matthias
>
> On 12/15/16 3:27 PM, Avi Flax wrote:
> >
> >> On Dec 13, 2016, at 21:02, Matthias J. Sax 
> wrote:
> >>
> >> thanks for your feedback.
> >
> > My pleasure!
> >
> >> We want to enlarge the scope for Streams
> >> application and started to collect use cases in the Wiki:
> >>
> >> https://cwiki.apache.org/confluence/display/KAFKA/
> Kafka+Streams+Data+%28Re%29Processing+Scenarios
> >
> > This page looks great, I love this sort of thing. Nice work.
> >
> >> Feel free to add there via editing the page or writing a comment.
> >
> > Thanks, but I don’t appear to have permissions to edit or comment on the
> page.
> >
> > Perhaps you could paste in the two use cases I described as a new
> comment?
> >
> > Thanks,
> > Avi
> >
>
>


-- 
-- Guozhang


Re: [VOTE] 0.10.1.1 RC1

2016-12-19 Thread Gwen Shapira
+1 (binding)

Validated signatures
Ran tests
Built from source distro
Tested binaries using the quickstart guide

Gwen

On Thu, Dec 15, 2016 at 1:29 PM, Guozhang Wang  wrote:
> Hello Kafka users, developers and client-developers,
>
> This is the second, and hopefully the last candidate for the release of
> Apache Kafka 0.10.1.1 before the break. This is a bug fix release and it
> includes fixes and improvements from 30 JIRAs. See the release notes for
> more details:
>
> http://home.apache.org/~guozhang/kafka-0.10.1.1-rc1/RELEASE_NOTES.html
>
> *** Please download, test and vote by Tuesday, 20 December, 8pm PT ***
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
> http://kafka.apache.org/KEYS
>
> * Release artifacts to be voted upon (source and binary):
> http://home.apache.org/~guozhang/kafka-0.10.1.1-rc1/
>
> * Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging/org/apache/kafka/
>
> NOTE the artifacts include the ones built from Scala 2.12.1 and Java8,
> which are treated a pre-alpha artifacts for the Scala community to try and
> test it out:
>
> https://repository.apache.org/content/groups/staging/org/apache/kafka/kafka_2.12/0.10.1.1/
>
> We will formally add the scala 2.12 support in future minor releases.
>
>
> * Javadoc:
> http://home.apache.org/~guozhang/kafka-0.10.1.1-rc1/javadoc/
>
> * Tag to be voted upon (off 0.10.0 branch) is the 0.10.0.1-rc0 tag:
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=c3638376708ee6c02dfe4e57747acae0126fa6e7
>
>
> Thanks,
> Guozhang
>
> --
> -- Guozhang



-- 
Gwen Shapira
Product Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter | blog


Build failed in Jenkins: kafka-trunk-jdk7 #1764

2016-12-19 Thread Apache Jenkins Server
See 

Changes:

[wangguoz] KAFKA-4534: StreamPartitionAssignor only ever updates the

--
[...truncated 17439 lines...]

org.apache.kafka.streams.KafkaStreamsTest > 
shouldNotGetAllTasksWithStoreWhenNotRunning STARTED

org.apache.kafka.streams.KafkaStreamsTest > 
shouldNotGetAllTasksWithStoreWhenNotRunning PASSED

org.apache.kafka.streams.KafkaStreamsTest > testCannotStartOnceClosed STARTED

org.apache.kafka.streams.KafkaStreamsTest > testCannotStartOnceClosed PASSED

org.apache.kafka.streams.KafkaStreamsTest > testCleanup STARTED

org.apache.kafka.streams.KafkaStreamsTest > testCleanup PASSED

org.apache.kafka.streams.KafkaStreamsTest > testStartAndClose STARTED

org.apache.kafka.streams.KafkaStreamsTest > testStartAndClose PASSED

org.apache.kafka.streams.KafkaStreamsTest > testCloseIsIdempotent STARTED

org.apache.kafka.streams.KafkaStreamsTest > testCloseIsIdempotent PASSED

org.apache.kafka.streams.KafkaStreamsTest > testCannotCleanupWhileRunning 
STARTED

org.apache.kafka.streams.KafkaStreamsTest > testCannotCleanupWhileRunning PASSED

org.apache.kafka.streams.KafkaStreamsTest > testCannotStartTwice STARTED

org.apache.kafka.streams.KafkaStreamsTest > testCannotStartTwice PASSED

org.apache.kafka.streams.integration.KStreamKTableJoinIntegrationTest > 
shouldCountClicksPerRegion[0] STARTED

org.apache.kafka.streams.integration.KStreamKTableJoinIntegrationTest > 
shouldCountClicksPerRegion[0] PASSED

org.apache.kafka.streams.integration.KStreamKTableJoinIntegrationTest > 
shouldCountClicksPerRegion[1] STARTED

org.apache.kafka.streams.integration.KStreamKTableJoinIntegrationTest > 
shouldCountClicksPerRegion[1] PASSED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldBeAbleToQueryState[0] STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldBeAbleToQueryState[0] PASSED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldNotMakeStoreAvailableUntilAllStoresAvailable[0] STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldNotMakeStoreAvailableUntilAllStoresAvailable[0] PASSED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
queryOnRebalance[0] STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
queryOnRebalance[0] FAILED
java.lang.AssertionError: Condition not met within timeout 6. Did not 
receive 1 number of records
at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:283)
at 
org.apache.kafka.streams.integration.utils.IntegrationTestUtils.waitUntilMinValuesRecordsReceived(IntegrationTestUtils.java:251)
at 
org.apache.kafka.streams.integration.QueryableStateIntegrationTest.waitUntilAtLeastNumRecordProcessed(QueryableStateIntegrationTest.java:669)
at 
org.apache.kafka.streams.integration.QueryableStateIntegrationTest.queryOnRebalance(QueryableStateIntegrationTest.java:350)

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
concurrentAccesses[0] STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
concurrentAccesses[0] PASSED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldBeAbleToQueryState[1] STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldBeAbleToQueryState[1] PASSED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldNotMakeStoreAvailableUntilAllStoresAvailable[1] STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldNotMakeStoreAvailableUntilAllStoresAvailable[1] PASSED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
queryOnRebalance[1] STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
queryOnRebalance[1] PASSED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
concurrentAccesses[1] STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
concurrentAccesses[1] PASSED

org.apache.kafka.streams.integration.RegexSourceIntegrationTest > 
testShouldReadFromRegexAndNamedTopics STARTED

org.apache.kafka.streams.integration.RegexSourceIntegrationTest > 
testShouldReadFromRegexAndNamedTopics PASSED

org.apache.kafka.streams.integration.RegexSourceIntegrationTest > 
testRegexMatchesTopicsAWhenCreated STARTED

org.apache.kafka.streams.integration.RegexSourceIntegrationTest > 
testRegexMatchesTopicsAWhenCreated PASSED

org.apache.kafka.streams.integration.RegexSourceIntegrationTest > 
testMultipleConsumersCanReadFromPartitionedTopic STARTED

org.apache.kafka.streams.integration.RegexSourceIntegrationTest > 
testMultipleConsumersCanReadFromPartitionedTopic PASSED

org.apache.kafka.streams.integration.RegexSourceIntegrationTest > 
testRegexMatchesTopicsAWhenDeleted STARTED

org.apache.kafka.streams.integration.RegexSourceIntegrationTest > 
tes

[GitHub] kafka pull request #2278: KAFKA-4526 - Disable throttling test until it can ...

2016-12-19 Thread apurvam
GitHub user apurvam opened a pull request:

https://github.com/apache/kafka/pull/2278

KAFKA-4526 - Disable throttling test until it can be fixed correctly.

At present, the test is fragile in the sense that the console consumer
has to start and be initialized before the verifiable producer begins
producing in the produce-consume-validate loop.

If this doesn't happen, the consumer will miss messages at the head of
the log and the test will fail.

At present, the consumer is considered inited once it has a PID. This is
a weak assumption. The plan is to poll appropriate metrics (like
partition assignment), and use those as a proxy for consumer
initialization. That work will be tracked in a separate ticket. For now,
we will disable the tests so that we can get the builds healthy again.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apurvam/kafka 
KAFKA-4526-throttling-test-failures

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2278.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2278


commit d7a0e0b9b69e52ca222f18409b3edb6663db0135
Author: Apurva Mehta 
Date:   2016-12-19T22:19:29Z

KAFKA-4526 - Disable throttling test until it can be fixed correctly.

At present, the test is fragile in the sense that the console consumer
has to start and be initialized before the verifiable producer begins
producing in the produce-consume-validate loop.

If this doesn't happen, the consumer will miss messages at the head of
the log and the test will fail.

At present, the consumer is considered inited once it has a PID. This is
a weak assumption. The plan is to poll appropriate metrics (like
partition assignment), and use those as a proxy for consumer
initialization. That work will be tracked in a separate ticket. For now,
we will disable the tests so that we can get the builds healthy again.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-4526) Transient failure in ThrottlingTest.test_throttled_reassignment

2016-12-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15762495#comment-15762495
 ] 

ASF GitHub Bot commented on KAFKA-4526:
---

GitHub user apurvam opened a pull request:

https://github.com/apache/kafka/pull/2278

KAFKA-4526 - Disable throttling test until it can be fixed correctly.

At present, the test is fragile in the sense that the console consumer
has to start and be initialized before the verifiable producer begins
producing in the produce-consume-validate loop.

If this doesn't happen, the consumer will miss messages at the head of
the log and the test will fail.

At present, the consumer is considered inited once it has a PID. This is
a weak assumption. The plan is to poll appropriate metrics (like
partition assignment), and use those as a proxy for consumer
initialization. That work will be tracked in a separate ticket. For now,
we will disable the tests so that we can get the builds healthy again.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apurvam/kafka 
KAFKA-4526-throttling-test-failures

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2278.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2278


commit d7a0e0b9b69e52ca222f18409b3edb6663db0135
Author: Apurva Mehta 
Date:   2016-12-19T22:19:29Z

KAFKA-4526 - Disable throttling test until it can be fixed correctly.

At present, the test is fragile in the sense that the console consumer
has to start and be initialized before the verifiable producer begins
producing in the produce-consume-validate loop.

If this doesn't happen, the consumer will miss messages at the head of
the log and the test will fail.

At present, the consumer is considered inited once it has a PID. This is
a weak assumption. The plan is to poll appropriate metrics (like
partition assignment), and use those as a proxy for consumer
initialization. That work will be tracked in a separate ticket. For now,
we will disable the tests so that we can get the builds healthy again.




> Transient failure in ThrottlingTest.test_throttled_reassignment
> ---
>
> Key: KAFKA-4526
> URL: https://issues.apache.org/jira/browse/KAFKA-4526
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ewen Cheslack-Postava
>Assignee: Apurva Mehta
>  Labels: system-test-failure, system-tests
> Fix For: 0.10.2.0
>
>
> This test is seeing transient failures sometimes
> {quote}
> Module: kafkatest.tests.core.throttling_test
> Class:  ThrottlingTest
> Method: test_throttled_reassignment
> Arguments:
> {
>   "bounce_brokers": false
> }
> {quote}
> This happens with both bounce_brokers = true and false. Fails with
> {quote}
> AssertionError: 1646 acked message did not make it to the Consumer. They are: 
> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19...plus 
> 1626 more. Total Acked: 174799, Total Consumed: 173153. We validated that the 
> first 1000 of these missing messages correctly made it into Kafka's data 
> files. This suggests they were lost on their way to the consumer.
> {quote}
> See 
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-12--001.1481535295--apache--trunk--62e043a/report.html
>  for an example.
> Note that there are a number of similar bug reports for different tests: 
> https://issues.apache.org/jira/issues/?jql=text%20~%20%22acked%20message%20did%20not%20make%20it%20to%20the%20Consumer%22%20and%20project%20%3D%20Kafka
>  I am wondering if we have a wrong ack setting somewhere that we should be 
> specifying as acks=all but is only defaulting to 0?
> It also seems interesting that the missing messages in these recent failures 
> seem to always start at 0...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-102 - Add close with timeout for consumers

2016-12-19 Thread Guozhang Wang
+1 on this idea as well.

Streams has also added a similar feature itself partly because consumer
does not support it directly (other part of the reason is that like
brokers, streams also have some exception handling logic which could lead
to deadlock with careless System.exit). For consumer itself I think the
trickiness lies in the prefetching calls as well as commit / HB requests
cleanup with the timeout, and I agree with Ewen that it's better to be
merged in the early release cycle than a last minute merge.



Guozhang

On Mon, Dec 19, 2016 at 4:18 AM, Rajini Sivaram 
wrote:

> Thank you for the reviews.
>
> @Becket @Ewen, Agree that making all blocking calls have a timeout will be
> trickier and hence the scope of this KIP is limited to close().
>
> @Jay Yes, this should definitely go into release notes, will make sure it
> is added. I will add some integration tests with broker failures for
> testing the timeout, but they cannot completely eliminate the risk of a
> hang. Over time, hopefully system tests will help catch most issues.
>
>
> On Sat, Dec 17, 2016 at 1:15 AM, Jay Kreps  wrote:
>
> > I think this is great. Sounds like one implication is that existing code
> > that called close() and hit the timeout would now hang indefinitely. We
> saw
> > this kind of thing a lot in automated testing scenarios where people
> don't
> > correctly sequence their shutdown of client and server. I think this is
> > okay, but might be good to include in the release notes.
> >
> > -jay
> >
> > On Thu, Dec 15, 2016 at 5:32 AM, Rajini Sivaram 
> > wrote:
> >
> > Hi all,
> >
> >
> >
> >
> >
> > I have just created KIP-102 to add a new close method for consumers with
> a
> >
> >
> > timeout parameter, making Consumer consistent with Producer:
> >
> >
> >
> >
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 102+-+Add+close+with+timeout+for+consumers
> >
> >
> >
> >
> >
> > Comments and suggestions are welcome.
> >
> >
> >
> >
> >
> > Thank you...
> >
> >
> >
> >
> >
> > Regards,
> >
> >
> >
> >
> >
> > Rajini
> >
>
>
>
> --
> Regards,
>
> Rajini
>



-- 
-- Guozhang


Re: [VOTE] KIP-100 - Relax Type constraints in Kafka Streams API

2016-12-19 Thread Guozhang Wang
+1.

On Sat, Dec 17, 2016 at 3:27 AM, Ismael Juma  wrote:

> Thanks Xavier. +1 (binding)
>
> Ismael
>
> On Fri, Dec 16, 2016 at 8:15 PM, Xavier Léauté 
> wrote:
>
> > Ismael made a good point so I updated KIP-100 and expanded its scope to
> > include covariant result types for functions applied to streams.
> > I will update the discussion thread accordingly.
> >
> > On Tue, Dec 13, 2016 at 12:13 AM Ismael Juma  wrote:
> >
> > > Hi Xavier,
> > >
> > > Thanks for the KIP. If Java had declaration site variance (proposed
> for a
> > > future Java version[1]), we'd mark function parameters as contravariant
> > > (i.e. "super") and the result as covariant (i.e. "extends"). In the
> > > meantime, we have to use the wildcards at use site as per your
> proposal.
> > > However, it seems that only the first case is covered by your proposal.
> > > This is an improvement, but is there any reason not to do the latter as
> > > well? It would be good to get it completely right this time.
> > >
> > > Ismael
> > >
> > > [1] http://openjdk.java.net/jeps/300
> > >
> > > On Fri, Dec 9, 2016 at 6:27 PM, Xavier Léauté 
> > wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > I would like to start the vote for KIP-100 unless there are any more
> > > > comments.
> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-100+-+
> > > > Relax+Type+constraints+in+Kafka+Streams+API
> > > >
> > > > corresponding PR here https://github.com/apache/kafka/pull/2205
> > > >
> > > > Thanks,
> > > > Xavier
> > > >
> > >
> >
>



-- 
-- Guozhang


[jira] [Commented] (KAFKA-4166) TestMirrorMakerService.test_bounce transient system test failure

2016-12-19 Thread Jason Gustafson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15762516#comment-15762516
 ] 

Jason Gustafson commented on KAFKA-4166:


Seems like MM is taking a little too long to stop when using the old consumer. 
From the logs, it's bumping right up against the 10 second limit nearly every 
time. Maybe we can just bump that timeout to 30 seconds

> TestMirrorMakerService.test_bounce transient system test failure
> 
>
> Key: KAFKA-4166
> URL: https://issues.apache.org/jira/browse/KAFKA-4166
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ismael Juma
>Assignee: Jason Gustafson
>  Labels: transient-system-test-failure
>
> We've only seen one failure so far and it's a timeout error so it could be an 
> environment issue. Filing it here so that we can track it in case there are 
> additional failures:
> {code}
> Module: kafkatest.tests.core.mirror_maker_test
> Class:  TestMirrorMakerService
> Method: test_bounce
> Arguments:
> {
>   "clean_shutdown": true,
>   "new_consumer": true,
>   "security_protocol": "SASL_SSL"
> }
> {code}
>  
> {code}
> test_id:
> 2016-09-12--001.kafkatest.tests.core.mirror_maker_test.TestMirrorMakerService.test_bounce.clean_shutdown=True.security_protocol=SASL_SSL.new_consumer=True
> status: FAIL
> run time:   3 minutes 30.354 seconds
> 
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/tests/runner.py",
>  line 106, in run_all_tests
> data = self.run_single_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/tests/runner.py",
>  line 162, in run_single_test
> return self.current_test_context.function(self.current_test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/mark/_mark.py",
>  line 331, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/core/mirror_maker_test.py",
>  line 178, in test_bounce
> self.run_produce_consume_validate(core_test_action=lambda: 
> self.bounce(clean_shutdown=clean_shutdown))
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 79, in run_produce_consume_validate
> raise e
> TimeoutError
> {code}
>  
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-09-12--001.1473700895--apache--trunk--a7ab9cb/TestMirrorMakerService/test_bounce/clean_shutdown=True.security_protocol=SASL_SSL.new_consumer=True.tgz



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [VOTE] KIP-87 - Add Compaction Tombstone Flag

2016-12-19 Thread Michael Pearce
Wow just read that def over tired. Hopefully it makes sense. Or you get the 
gist at least.


From: Michael Pearce 
Sent: Monday, December 19, 2016 9:19:02 PM
To: dev@kafka.apache.org
Subject: Re: [VOTE] KIP-87 - Add Compaction Tombstone Flag

Hi Jay,

Agreed this stemmed as offshoot from KIP-82.

Which our main driver for was to be able to have some headers for a null value 
as such for our routing, audit, tracing and a few other bits which currently we 
are forced to do with a message wrapper, if we all agreed on KIP-82 that we 
need native headers and look to implement that the push for this would 
dissipate.

This KIP would allow for though one use case that comes to mind we could see 
which is to have business data with a delete. Though as said this isn't 
something we are pushing for think really we would have.

As such in summary yes, if you want to fully support KIP-82 and we can get that 
agreed in principle and a target release version, I think quite a few guys at 
LinkedIn are quite pro it too ;) I'm happy to drop this one.

Cheers
Mike

Sent using OWA for iPhone

From: Jay Kreps 
Sent: Monday, December 19, 2016 8:51:23 PM
To: dev@kafka.apache.org
Subject: Re: [VOTE] KIP-87 - Add Compaction Tombstone Flag

Hey Michael,

Here is the compatibility concern I have:

   1. You have a consumer app that relies on value == null to indicate a
   delete (current semantics).
   2. You upgrade Kafka and your clients.
   3. Some producer starts using the tombstone field in combination with
   non-null.

I share Ismael's dislike of setting tombstones on records with null values.
This makes sense as a transitional state, but as an end state its a bit
weird. You'd expect to be able to mix null values and tombstones, and have
the null values remain and the tombstones get compacted. However what will
happen is both will be compacted and upon debugging this you'll learn that
we sometimes use null in the value to indicate tombstone. Ismael's solution
is a bigger compatibility break, though, so not sure if that is better.

My other question is the relationship to KIP-82. My read is that this KIP
solves some but not all of the problems KIP-82 is intended for. KIP-82, on
the other hand, seems to address most of the motivating uses for this KIP.
The exception is maybe item (5) on the list where you want to simultaneous
delete and convey some information to subscribers, but I couldn't construct
a concrete examples for that one. Do we need to rationalize these two KIPs?
That is, do you still advocate this proposal if we do KIP-82 and vice
versa? As you may have noticed I'm somewhat emotionally invested in the
simplicity of the core data model, so my default position is let's try to
avoid stuffing more stuff in, but if we have to add stuff I like each of
these individually more than doing both. :-)

-Jay




On Fri, Dec 16, 2016 at 12:16 PM, Michael Pearce 
wrote:

> Hi Jay
>
> I disagree here that we are breaking any compatibility, we went through
> this on the discussion thread in fact with the help of that thread is how
> the kip came to the solution.
>
> Also on the supported combinations front you mention, we are not taking
> anything away afaik.
>
> Currently supported are only are:
> Null value = delete
> Non-null value = non delete
>
> With this kip we would support
> Null value + tombstone = delete
> Non null value + tombstone = delete
> Non null value + no tombstone = non delete
>
> As for the alternative idea, this is simply a new policy, how can there be
> confusion here? For this policy it would be explicit that tombstone marker
> would need to be set for a delete.
>
> I'm going to vent a little now as starting to get quite frustrated.
>
> We are going round in circles on kip-82 as per use cases there is now many
> use cases, how many more are needed? just because confluent don't see these
> doesn't mean they aren't real use cases other have, this is the point of
> the Apache foundation, it shouldn't be the view of just one organisation.
> It really is getting a feeling of the NIH syndrome. Rather than it being
> constructive on discussion of the implementation detail.
>
> kip-87 spawned from as on the kip call we all agreed this was needed. And
> would at least allow a custom wrapper be supported in a compacted topic,
> allowing meta data. Which again now I feel we are spinning wheels, and
> simply finding reasons not support it.
>
> Cheers
> Mike
>
>
>
> Sent using OWA for iPad
> 
> From: Jay Kreps 
> Sent: Friday, December 16, 2016 7:09:23 PM
> To: dev@kafka.apache.org
> Subject: Re: [VOTE] KIP-87 - Add Compaction Tombstone Flag
>
> Hey Michael,
>
> I do think it might have been better had we started with a separate concept
> of null vs delete. Given that we are where we are, I'm not sure that the
> possible cures we explored so far are better than the disease.
>
> I apologize for coming 

[GitHub] kafka pull request #2279: KAFKA-4166: Fix transient MM failure caused by slo...

2016-12-19 Thread hachikuji
GitHub user hachikuji opened a pull request:

https://github.com/apache/kafka/pull/2279

KAFKA-4166: Fix transient MM failure caused by slow old consumer shutdown



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hachikuji/kafka KAFKA-4166

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2279.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2279


commit bb274738122fcea2d8540e9599436dfdbfda0b0c
Author: Jason Gustafson 
Date:   2016-12-19T22:46:02Z

KAFKA-4166: Fix transient MM failure caused by slow old consumer shutdown




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Build failed in Jenkins: kafka-trunk-jdk8 #1112

2016-12-19 Thread Apache Jenkins Server
See 

Changes:

[wangguoz] KAFKA-4534: StreamPartitionAssignor only ever updates the

[jason] KAFKA-4553; Improve round robin assignment in Connect to avoid uneven

--
[...truncated 17463 lines...]

org.apache.kafka.streams.KafkaStreamsTest > 
shouldNotGetAllTasksWithStoreWhenNotRunning STARTED

org.apache.kafka.streams.KafkaStreamsTest > 
shouldNotGetAllTasksWithStoreWhenNotRunning PASSED

org.apache.kafka.streams.KafkaStreamsTest > testCannotStartOnceClosed STARTED

org.apache.kafka.streams.KafkaStreamsTest > testCannotStartOnceClosed PASSED

org.apache.kafka.streams.KafkaStreamsTest > testCleanup STARTED

org.apache.kafka.streams.KafkaStreamsTest > testCleanup PASSED

org.apache.kafka.streams.KafkaStreamsTest > testStartAndClose STARTED

org.apache.kafka.streams.KafkaStreamsTest > testStartAndClose PASSED

org.apache.kafka.streams.KafkaStreamsTest > testCloseIsIdempotent STARTED

org.apache.kafka.streams.KafkaStreamsTest > testCloseIsIdempotent PASSED

org.apache.kafka.streams.KafkaStreamsTest > testCannotCleanupWhileRunning 
STARTED

org.apache.kafka.streams.KafkaStreamsTest > testCannotCleanupWhileRunning PASSED

org.apache.kafka.streams.KafkaStreamsTest > testCannotStartTwice STARTED

org.apache.kafka.streams.KafkaStreamsTest > testCannotStartTwice PASSED

org.apache.kafka.streams.integration.KStreamKTableJoinIntegrationTest > 
shouldCountClicksPerRegion[0] STARTED

org.apache.kafka.streams.integration.KStreamKTableJoinIntegrationTest > 
shouldCountClicksPerRegion[0] PASSED

org.apache.kafka.streams.integration.KStreamKTableJoinIntegrationTest > 
shouldCountClicksPerRegion[1] STARTED

org.apache.kafka.streams.integration.KStreamKTableJoinIntegrationTest > 
shouldCountClicksPerRegion[1] PASSED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldBeAbleToQueryState[0] STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldBeAbleToQueryState[0] PASSED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldNotMakeStoreAvailableUntilAllStoresAvailable[0] STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldNotMakeStoreAvailableUntilAllStoresAvailable[0] PASSED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
queryOnRebalance[0] STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
queryOnRebalance[0] PASSED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
concurrentAccesses[0] STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
concurrentAccesses[0] PASSED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldBeAbleToQueryState[1] STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldBeAbleToQueryState[1] PASSED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldNotMakeStoreAvailableUntilAllStoresAvailable[1] STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldNotMakeStoreAvailableUntilAllStoresAvailable[1] PASSED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
queryOnRebalance[1] STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
queryOnRebalance[1] FAILED
java.lang.AssertionError: Condition not met within timeout 6. Did not 
receive 1 number of records
at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:283)
at 
org.apache.kafka.streams.integration.utils.IntegrationTestUtils.waitUntilMinValuesRecordsReceived(IntegrationTestUtils.java:251)
at 
org.apache.kafka.streams.integration.QueryableStateIntegrationTest.waitUntilAtLeastNumRecordProcessed(QueryableStateIntegrationTest.java:669)
at 
org.apache.kafka.streams.integration.QueryableStateIntegrationTest.queryOnRebalance(QueryableStateIntegrationTest.java:350)

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
concurrentAccesses[1] STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
concurrentAccesses[1] PASSED

org.apache.kafka.streams.integration.RegexSourceIntegrationTest > 
testShouldReadFromRegexAndNamedTopics STARTED

org.apache.kafka.streams.integration.RegexSourceIntegrationTest > 
testShouldReadFromRegexAndNamedTopics PASSED

org.apache.kafka.streams.integration.RegexSourceIntegrationTest > 
testRegexMatchesTopicsAWhenCreated STARTED

org.apache.kafka.streams.integration.RegexSourceIntegrationTest > 
testRegexMatchesTopicsAWhenCreated PASSED

org.apache.kafka.streams.integration.RegexSourceIntegrationTest > 
testMultipleConsumersCanReadFromPartitionedTopic STARTED

org.apache.kafka.streams.integration.RegexSourceIntegrationTest > 
testMultipleConsumersCanReadFromPartitionedTopic PASSED

org.apache.kafka.streams.integration.RegexSourceIntegrationTest > 
testRegexMatchesTopicsAWhenDeleted 

[jira] [Commented] (KAFKA-4166) TestMirrorMakerService.test_bounce transient system test failure

2016-12-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15762540#comment-15762540
 ] 

ASF GitHub Bot commented on KAFKA-4166:
---

GitHub user hachikuji opened a pull request:

https://github.com/apache/kafka/pull/2279

KAFKA-4166: Fix transient MM failure caused by slow old consumer shutdown



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hachikuji/kafka KAFKA-4166

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2279.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2279


commit bb274738122fcea2d8540e9599436dfdbfda0b0c
Author: Jason Gustafson 
Date:   2016-12-19T22:46:02Z

KAFKA-4166: Fix transient MM failure caused by slow old consumer shutdown




> TestMirrorMakerService.test_bounce transient system test failure
> 
>
> Key: KAFKA-4166
> URL: https://issues.apache.org/jira/browse/KAFKA-4166
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ismael Juma
>Assignee: Jason Gustafson
>  Labels: transient-system-test-failure
>
> We've only seen one failure so far and it's a timeout error so it could be an 
> environment issue. Filing it here so that we can track it in case there are 
> additional failures:
> {code}
> Module: kafkatest.tests.core.mirror_maker_test
> Class:  TestMirrorMakerService
> Method: test_bounce
> Arguments:
> {
>   "clean_shutdown": true,
>   "new_consumer": true,
>   "security_protocol": "SASL_SSL"
> }
> {code}
>  
> {code}
> test_id:
> 2016-09-12--001.kafkatest.tests.core.mirror_maker_test.TestMirrorMakerService.test_bounce.clean_shutdown=True.security_protocol=SASL_SSL.new_consumer=True
> status: FAIL
> run time:   3 minutes 30.354 seconds
> 
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/tests/runner.py",
>  line 106, in run_all_tests
> data = self.run_single_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/tests/runner.py",
>  line 162, in run_single_test
> return self.current_test_context.function(self.current_test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/mark/_mark.py",
>  line 331, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/core/mirror_maker_test.py",
>  line 178, in test_bounce
> self.run_produce_consume_validate(core_test_action=lambda: 
> self.bounce(clean_shutdown=clean_shutdown))
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 79, in run_produce_consume_validate
> raise e
> TimeoutError
> {code}
>  
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-09-12--001.1473700895--apache--trunk--a7ab9cb/TestMirrorMakerService/test_bounce/clean_shutdown=True.security_protocol=SASL_SSL.new_consumer=True.tgz



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [VOTE] 0.10.1.1 RC1

2016-12-19 Thread Jun Rao
Hi, Guozhang,

Thanks for preparing the release. Verified quickstart. +1

Jun

On Thu, Dec 15, 2016 at 1:29 PM, Guozhang Wang  wrote:

> Hello Kafka users, developers and client-developers,
>
> This is the second, and hopefully the last candidate for the release of
> Apache Kafka 0.10.1.1 before the break. This is a bug fix release and it
> includes fixes and improvements from 30 JIRAs. See the release notes for
> more details:
>
> http://home.apache.org/~guozhang/kafka-0.10.1.1-rc1/RELEASE_NOTES.html
>
> *** Please download, test and vote by Tuesday, 20 December, 8pm PT ***
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
> http://kafka.apache.org/KEYS
>
> * Release artifacts to be voted upon (source and binary):
> http://home.apache.org/~guozhang/kafka-0.10.1.1-rc1/
>
> * Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging/org/apache/kafka/
>
> NOTE the artifacts include the ones built from Scala 2.12.1 and Java8,
> which are treated a pre-alpha artifacts for the Scala community to try and
> test it out:
>
> https://repository.apache.org/content/groups/staging/org/
> apache/kafka/kafka_2.12/0.10.1.1/
>
> We will formally add the scala 2.12 support in future minor releases.
>
>
> * Javadoc:
> http://home.apache.org/~guozhang/kafka-0.10.1.1-rc1/javadoc/
>
> * Tag to be voted upon (off 0.10.0 branch) is the 0.10.0.1-rc0 tag:
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=
> c3638376708ee6c02dfe4e57747acae0126fa6e7
>
>
> Thanks,
> Guozhang
>
> --
> -- Guozhang
>


Re: [VOTE] KIP-87 - Add Compaction Tombstone Flag

2016-12-19 Thread Jay Kreps
Makes sense!

-Jay

On Mon, Dec 19, 2016 at 2:40 PM, Michael Pearce 
wrote:

> Wow just read that def over tired. Hopefully it makes sense. Or you get
> the gist at least.
>
> 
> From: Michael Pearce 
> Sent: Monday, December 19, 2016 9:19:02 PM
> To: dev@kafka.apache.org
> Subject: Re: [VOTE] KIP-87 - Add Compaction Tombstone Flag
>
> Hi Jay,
>
> Agreed this stemmed as offshoot from KIP-82.
>
> Which our main driver for was to be able to have some headers for a null
> value as such for our routing, audit, tracing and a few other bits which
> currently we are forced to do with a message wrapper, if we all agreed on
> KIP-82 that we need native headers and look to implement that the push for
> this would dissipate.
>
> This KIP would allow for though one use case that comes to mind we could
> see which is to have business data with a delete. Though as said this isn't
> something we are pushing for think really we would have.
>
> As such in summary yes, if you want to fully support KIP-82 and we can get
> that agreed in principle and a target release version, I think quite a few
> guys at LinkedIn are quite pro it too ;) I'm happy to drop this one.
>
> Cheers
> Mike
>
> Sent using OWA for iPhone
> 
> From: Jay Kreps 
> Sent: Monday, December 19, 2016 8:51:23 PM
> To: dev@kafka.apache.org
> Subject: Re: [VOTE] KIP-87 - Add Compaction Tombstone Flag
>
> Hey Michael,
>
> Here is the compatibility concern I have:
>
>1. You have a consumer app that relies on value == null to indicate a
>delete (current semantics).
>2. You upgrade Kafka and your clients.
>3. Some producer starts using the tombstone field in combination with
>non-null.
>
> I share Ismael's dislike of setting tombstones on records with null values.
> This makes sense as a transitional state, but as an end state its a bit
> weird. You'd expect to be able to mix null values and tombstones, and have
> the null values remain and the tombstones get compacted. However what will
> happen is both will be compacted and upon debugging this you'll learn that
> we sometimes use null in the value to indicate tombstone. Ismael's solution
> is a bigger compatibility break, though, so not sure if that is better.
>
> My other question is the relationship to KIP-82. My read is that this KIP
> solves some but not all of the problems KIP-82 is intended for. KIP-82, on
> the other hand, seems to address most of the motivating uses for this KIP.
> The exception is maybe item (5) on the list where you want to simultaneous
> delete and convey some information to subscribers, but I couldn't construct
> a concrete examples for that one. Do we need to rationalize these two KIPs?
> That is, do you still advocate this proposal if we do KIP-82 and vice
> versa? As you may have noticed I'm somewhat emotionally invested in the
> simplicity of the core data model, so my default position is let's try to
> avoid stuffing more stuff in, but if we have to add stuff I like each of
> these individually more than doing both. :-)
>
> -Jay
>
>
>
>
> On Fri, Dec 16, 2016 at 12:16 PM, Michael Pearce 
> wrote:
>
> > Hi Jay
> >
> > I disagree here that we are breaking any compatibility, we went through
> > this on the discussion thread in fact with the help of that thread is how
> > the kip came to the solution.
> >
> > Also on the supported combinations front you mention, we are not taking
> > anything away afaik.
> >
> > Currently supported are only are:
> > Null value = delete
> > Non-null value = non delete
> >
> > With this kip we would support
> > Null value + tombstone = delete
> > Non null value + tombstone = delete
> > Non null value + no tombstone = non delete
> >
> > As for the alternative idea, this is simply a new policy, how can there
> be
> > confusion here? For this policy it would be explicit that tombstone
> marker
> > would need to be set for a delete.
> >
> > I'm going to vent a little now as starting to get quite frustrated.
> >
> > We are going round in circles on kip-82 as per use cases there is now
> many
> > use cases, how many more are needed? just because confluent don't see
> these
> > doesn't mean they aren't real use cases other have, this is the point of
> > the Apache foundation, it shouldn't be the view of just one organisation.
> > It really is getting a feeling of the NIH syndrome. Rather than it being
> > constructive on discussion of the implementation detail.
> >
> > kip-87 spawned from as on the kip call we all agreed this was needed. And
> > would at least allow a custom wrapper be supported in a compacted topic,
> > allowing meta data. Which again now I feel we are spinning wheels, and
> > simply finding reasons not support it.
> >
> > Cheers
> > Mike
> >
> >
> >
> > Sent using OWA for iPad
> > 
> > From: Jay Kreps 
> > Sent: Friday, December 16, 2016 7:09:23 PM
> > To: dev@kafka.apache.org
> > Subject: Re: [VOTE] KI

[jira] [Created] (KAFKA-4558) throttling_test fails if the producer starts too fast.

2016-12-19 Thread Apurva Mehta (JIRA)
Apurva Mehta created KAFKA-4558:
---

 Summary: throttling_test fails if the producer starts too fast.
 Key: KAFKA-4558
 URL: https://issues.apache.org/jira/browse/KAFKA-4558
 Project: Kafka
  Issue Type: Bug
Reporter: Apurva Mehta
Assignee: Apurva Mehta


As described in https://issues.apache.org/jira/browse/KAFKA-4526, the 
throttling test will fail if the producer in the produce-consume-validate loop 
starts up before the consumer is fully initialized.

We need to block the start of the producer until the consumer is ready to go. 

The current plan is to poll the consumer for a particular metric (like, for 
instance, partition assignment) which will act as a good proxy for successful 
initialization. Currently, we just check for the existence of a process with 
the PID, which is not a strong enough check, causing the test to fail 
intermittently. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional Messaging

2016-12-19 Thread Guozhang Wang
One more thing about the double journal proposal: when discussing about
this method back at LinkedIn, another raised issue besides double writing
was that it will void the offset ordering and enforce people to accept
"transaction ordering", that is, consumer will not see messages from the
same partition in the order where they were produced, but only in the order
of when the corresponding transaction was committed. For some scenarios, we
believe that offset ordering would still be preferred than transaction
ordering and that is why in KIP-98 proposal we default to the former while
leave the door open if users want to switch to the latter case.


Guozhang

On Mon, Dec 19, 2016 at 10:56 AM, Jay Kreps  wrote:

> Hey Radai,
>
> I'm not sure if I fully understand what you are proposing, but I
> interpreted it to be similar to a proposal we worked through back at
> LinkedIn. The proposal was to commit to a central txlog topic, and then
> recopy to the destination topic upon transaction commit. The observation on
> that approach at the time were the following:
>
>1. It is cleaner since the output topics have only committed data!
>2. You need full replication on the txlog topic to ensure atomicity. We
>weren't able to come up with a solution where you buffer in memory or
> use
>renaming tricks the way you are describing. The reason is that once you
>begin committing you must ensure that the commit eventually succeeds to
>guarantee atomicity. If you use a transient store you might commit some
>data and then have a server failure that causes you to lose the rest of
> the
>transaction.
>3. Having a single log allows the reader to choose a "read uncommitted"
>mode that hands out messages immediately. This is important for cases
> where
>latency is important, especially for stream processing topologies where
>these latencies stack up across multiple stages.
>
> For the stream processing use case, item (2) is a bit of a deal killer.
> This takes the cost of a transient message write (say the intermediate
> result of a stream processing topology) from 3x writes (assuming 3x
> replication) to 6x writes. This means you basically can't default it on. If
> we can in fact get the cost down to a single buffered write (i.e. 1x the
> data is written to memory and buffered to disk if the transaction is large)
> as in the KIP-98 proposal without too many other negative side effects I
> think that could be compelling.
>
> -Jay
>
>
>
> On Mon, Dec 19, 2016 at 9:36 AM, radai  wrote:
>
> > regarding efficiency:
> >
> > I'd like to distinguish between server efficiency (resource utilization
> of
> > the broker machine alone) and overall network efficiency (resource
> > utilization on brokers, producers and consumers, including network
> > traffic).
> > my proposal is not as resource-efficient on the broker (although it can
> be,
> > depends on a few trade offs and implementation details). HOWEVER, if i
> look
> > at the overall efficiency:
> >
> >1.clients would need to either buffer or double-read uncommitted msgs.
> > for N clients reading the stream M times (after re-starts and reconsumes)
> > this would mean a M*N factor in either network BW or disk/memory space
> > (depends on if buffer vs re-read). potentially N*M more broker-side reads
> > too.
> >2 to reduce the broker side cost several things can be done (this is
> not
> > an either-or list, these are commulative):
> >   2.1 - keep TX logs in mem (+overflow to disk) - trades disk writes
> > for TX resiliency
> >   2.2 - when "appending" TX logs to real partitions - instead of
> > reading from (disk-based) TX log and writing to partition log (x2 disk
> > writes) the TX log can be made a segment file (so file rename, with
> > associated protocol changes). this would avoid double writing by simply
> > making the TX file part of the partition (for large enough TXs. smaller
> > ones can be rewritten).
> >   2.3 - the approach above could be combined with a background
> "defrag"
> > - similar in concept to compaction - to further reduce the total of
> > resulting number of files.
> >
> > I think my main issue with the current proposal, more important than
> > performance, is lack of proper "encapsulation" of transactions - I dont
> > think downstream consumers should see uncommitted msgs. ever.
> >
> >
> > On Sun, Dec 18, 2016 at 10:19 PM, Becket Qin 
> wrote:
> >
> > > @Jason
> > >
> > > Yes, second thought on the number of messages included, the offset
> delta
> > > will probably be sufficient. The use case I encounter before for number
> > of
> > > messages in a message set is an embedded mirror maker on the
> destination
> > > broker side which fetches message directly from the source cluster.
> > Ideally
> > > the destination cluster only needs to check CRC and assign the offsets
> > > because all the message verification has been done by the source
> cluster,
> > > but due to the lack of the number of messages

[jira] [Assigned] (KAFKA-3808) Transient failure in ReplicaVerificationToolTest

2016-12-19 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma reassigned KAFKA-3808:
--

Assignee: Ismael Juma

> Transient failure in ReplicaVerificationToolTest
> 
>
> Key: KAFKA-3808
> URL: https://issues.apache.org/jira/browse/KAFKA-3808
> Project: Kafka
>  Issue Type: Bug
>  Components: system tests
>Reporter: Geoff Anderson
>Assignee: Ismael Juma
>
> {code}
> test_id:
> 2016-05-29--001.kafkatest.tests.tools.replica_verification_test.ReplicaVerificationToolTest.test_replica_lags
> status: FAIL
> run time:   1 minute 9.231 seconds
> Timed out waiting to reach non-zero number of replica lags.
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.5.1-py2.7.egg/ducktape/tests/runner.py",
>  line 106, in run_all_tests
> data = self.run_single_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.5.1-py2.7.egg/ducktape/tests/runner.py",
>  line 162, in run_single_test
> return self.current_test_context.function(self.current_test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/tools/replica_verification_test.py",
>  line 88, in test_replica_lags
> err_msg="Timed out waiting to reach non-zero number of replica lags.")
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.5.1-py2.7.egg/ducktape/utils/util.py",
>  line 36, in wait_until
> raise TimeoutError(err_msg)
> TimeoutError: Timed out waiting to reach non-zero number of replica lags.
> {code}
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-05-29--001.1464540508--apache--trunk--404b696/report.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4559) Add a site search bar on the Web site

2016-12-19 Thread Guozhang Wang (JIRA)
Guozhang Wang created KAFKA-4559:


 Summary: Add a site search bar on the Web site
 Key: KAFKA-4559
 URL: https://issues.apache.org/jira/browse/KAFKA-4559
 Project: Kafka
  Issue Type: Bug
  Components: website
Reporter: Guozhang Wang


As titled, as we are breaking the "documentation" html into sub spaces and sub 
pages, people cannot simply use `control + f` on that page, and a site-scope 
search bar would help in this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Jenkins build is back to normal : kafka-trunk-jdk7 #1765

2016-12-19 Thread Apache Jenkins Server
See 



[GitHub] kafka pull request #2279: KAFKA-4166: Fix transient MM failure caused by slo...

2016-12-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2279


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (KAFKA-4166) TestMirrorMakerService.test_bounce transient system test failure

2016-12-19 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-4166:
---
Fix Version/s: 0.10.2.0

> TestMirrorMakerService.test_bounce transient system test failure
> 
>
> Key: KAFKA-4166
> URL: https://issues.apache.org/jira/browse/KAFKA-4166
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ismael Juma
>Assignee: Jason Gustafson
>  Labels: transient-system-test-failure
> Fix For: 0.10.2.0
>
>
> We've only seen one failure so far and it's a timeout error so it could be an 
> environment issue. Filing it here so that we can track it in case there are 
> additional failures:
> {code}
> Module: kafkatest.tests.core.mirror_maker_test
> Class:  TestMirrorMakerService
> Method: test_bounce
> Arguments:
> {
>   "clean_shutdown": true,
>   "new_consumer": true,
>   "security_protocol": "SASL_SSL"
> }
> {code}
>  
> {code}
> test_id:
> 2016-09-12--001.kafkatest.tests.core.mirror_maker_test.TestMirrorMakerService.test_bounce.clean_shutdown=True.security_protocol=SASL_SSL.new_consumer=True
> status: FAIL
> run time:   3 minutes 30.354 seconds
> 
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/tests/runner.py",
>  line 106, in run_all_tests
> data = self.run_single_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/tests/runner.py",
>  line 162, in run_single_test
> return self.current_test_context.function(self.current_test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/mark/_mark.py",
>  line 331, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/core/mirror_maker_test.py",
>  line 178, in test_bounce
> self.run_produce_consume_validate(core_test_action=lambda: 
> self.bounce(clean_shutdown=clean_shutdown))
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 79, in run_produce_consume_validate
> raise e
> TimeoutError
> {code}
>  
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-09-12--001.1473700895--apache--trunk--a7ab9cb/TestMirrorMakerService/test_bounce/clean_shutdown=True.security_protocol=SASL_SSL.new_consumer=True.tgz



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4166) TestMirrorMakerService.test_bounce transient system test failure

2016-12-19 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-4166:
---
Status: Patch Available  (was: Open)

> TestMirrorMakerService.test_bounce transient system test failure
> 
>
> Key: KAFKA-4166
> URL: https://issues.apache.org/jira/browse/KAFKA-4166
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ismael Juma
>Assignee: Jason Gustafson
>  Labels: transient-system-test-failure
> Fix For: 0.10.2.0
>
>
> We've only seen one failure so far and it's a timeout error so it could be an 
> environment issue. Filing it here so that we can track it in case there are 
> additional failures:
> {code}
> Module: kafkatest.tests.core.mirror_maker_test
> Class:  TestMirrorMakerService
> Method: test_bounce
> Arguments:
> {
>   "clean_shutdown": true,
>   "new_consumer": true,
>   "security_protocol": "SASL_SSL"
> }
> {code}
>  
> {code}
> test_id:
> 2016-09-12--001.kafkatest.tests.core.mirror_maker_test.TestMirrorMakerService.test_bounce.clean_shutdown=True.security_protocol=SASL_SSL.new_consumer=True
> status: FAIL
> run time:   3 minutes 30.354 seconds
> 
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/tests/runner.py",
>  line 106, in run_all_tests
> data = self.run_single_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/tests/runner.py",
>  line 162, in run_single_test
> return self.current_test_context.function(self.current_test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/mark/_mark.py",
>  line 331, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/core/mirror_maker_test.py",
>  line 178, in test_bounce
> self.run_produce_consume_validate(core_test_action=lambda: 
> self.bounce(clean_shutdown=clean_shutdown))
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 79, in run_produce_consume_validate
> raise e
> TimeoutError
> {code}
>  
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-09-12--001.1473700895--apache--trunk--a7ab9cb/TestMirrorMakerService/test_bounce/clean_shutdown=True.security_protocol=SASL_SSL.new_consumer=True.tgz



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4166) TestMirrorMakerService.test_bounce transient system test failure

2016-12-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15762654#comment-15762654
 ] 

ASF GitHub Bot commented on KAFKA-4166:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2279


> TestMirrorMakerService.test_bounce transient system test failure
> 
>
> Key: KAFKA-4166
> URL: https://issues.apache.org/jira/browse/KAFKA-4166
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ismael Juma
>Assignee: Jason Gustafson
>  Labels: transient-system-test-failure
> Fix For: 0.10.2.0
>
>
> We've only seen one failure so far and it's a timeout error so it could be an 
> environment issue. Filing it here so that we can track it in case there are 
> additional failures:
> {code}
> Module: kafkatest.tests.core.mirror_maker_test
> Class:  TestMirrorMakerService
> Method: test_bounce
> Arguments:
> {
>   "clean_shutdown": true,
>   "new_consumer": true,
>   "security_protocol": "SASL_SSL"
> }
> {code}
>  
> {code}
> test_id:
> 2016-09-12--001.kafkatest.tests.core.mirror_maker_test.TestMirrorMakerService.test_bounce.clean_shutdown=True.security_protocol=SASL_SSL.new_consumer=True
> status: FAIL
> run time:   3 minutes 30.354 seconds
> 
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/tests/runner.py",
>  line 106, in run_all_tests
> data = self.run_single_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/tests/runner.py",
>  line 162, in run_single_test
> return self.current_test_context.function(self.current_test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/mark/_mark.py",
>  line 331, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/core/mirror_maker_test.py",
>  line 178, in test_bounce
> self.run_produce_consume_validate(core_test_action=lambda: 
> self.bounce(clean_shutdown=clean_shutdown))
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 79, in run_produce_consume_validate
> raise e
> TimeoutError
> {code}
>  
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-09-12--001.1473700895--apache--trunk--a7ab9cb/TestMirrorMakerService/test_bounce/clean_shutdown=True.security_protocol=SASL_SSL.new_consumer=True.tgz



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4166) TestMirrorMakerService.test_bounce transient system test failure

2016-12-19 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-4166:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> TestMirrorMakerService.test_bounce transient system test failure
> 
>
> Key: KAFKA-4166
> URL: https://issues.apache.org/jira/browse/KAFKA-4166
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ismael Juma
>Assignee: Jason Gustafson
>  Labels: transient-system-test-failure
> Fix For: 0.10.2.0
>
>
> We've only seen one failure so far and it's a timeout error so it could be an 
> environment issue. Filing it here so that we can track it in case there are 
> additional failures:
> {code}
> Module: kafkatest.tests.core.mirror_maker_test
> Class:  TestMirrorMakerService
> Method: test_bounce
> Arguments:
> {
>   "clean_shutdown": true,
>   "new_consumer": true,
>   "security_protocol": "SASL_SSL"
> }
> {code}
>  
> {code}
> test_id:
> 2016-09-12--001.kafkatest.tests.core.mirror_maker_test.TestMirrorMakerService.test_bounce.clean_shutdown=True.security_protocol=SASL_SSL.new_consumer=True
> status: FAIL
> run time:   3 minutes 30.354 seconds
> 
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/tests/runner.py",
>  line 106, in run_all_tests
> data = self.run_single_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/tests/runner.py",
>  line 162, in run_single_test
> return self.current_test_context.function(self.current_test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/mark/_mark.py",
>  line 331, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/core/mirror_maker_test.py",
>  line 178, in test_bounce
> self.run_produce_consume_validate(core_test_action=lambda: 
> self.bounce(clean_shutdown=clean_shutdown))
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 79, in run_produce_consume_validate
> raise e
> TimeoutError
> {code}
>  
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-09-12--001.1473700895--apache--trunk--a7ab9cb/TestMirrorMakerService/test_bounce/clean_shutdown=True.security_protocol=SASL_SSL.new_consumer=True.tgz



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #2278: KAFKA-4526 - Disable throttling test until it can ...

2016-12-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2278


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-4526) Transient failure in ThrottlingTest.test_throttled_reassignment

2016-12-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15762663#comment-15762663
 ] 

ASF GitHub Bot commented on KAFKA-4526:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2278


> Transient failure in ThrottlingTest.test_throttled_reassignment
> ---
>
> Key: KAFKA-4526
> URL: https://issues.apache.org/jira/browse/KAFKA-4526
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ewen Cheslack-Postava
>Assignee: Apurva Mehta
>  Labels: system-test-failure, system-tests
> Fix For: 0.10.2.0
>
>
> This test is seeing transient failures sometimes
> {quote}
> Module: kafkatest.tests.core.throttling_test
> Class:  ThrottlingTest
> Method: test_throttled_reassignment
> Arguments:
> {
>   "bounce_brokers": false
> }
> {quote}
> This happens with both bounce_brokers = true and false. Fails with
> {quote}
> AssertionError: 1646 acked message did not make it to the Consumer. They are: 
> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19...plus 
> 1626 more. Total Acked: 174799, Total Consumed: 173153. We validated that the 
> first 1000 of these missing messages correctly made it into Kafka's data 
> files. This suggests they were lost on their way to the consumer.
> {quote}
> See 
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-12--001.1481535295--apache--trunk--62e043a/report.html
>  for an example.
> Note that there are a number of similar bug reports for different tests: 
> https://issues.apache.org/jira/issues/?jql=text%20~%20%22acked%20message%20did%20not%20make%20it%20to%20the%20Consumer%22%20and%20project%20%3D%20Kafka
>  I am wondering if we have a wrong ack setting somewhere that we should be 
> specifying as acks=all but is only defaulting to 0?
> It also seems interesting that the missing messages in these recent failures 
> seem to always start at 0...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-4526) Transient failure in ThrottlingTest.test_throttled_reassignment

2016-12-19 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma resolved KAFKA-4526.

Resolution: Fixed

Issue resolved by pull request 2278
[https://github.com/apache/kafka/pull/2278]

> Transient failure in ThrottlingTest.test_throttled_reassignment
> ---
>
> Key: KAFKA-4526
> URL: https://issues.apache.org/jira/browse/KAFKA-4526
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ewen Cheslack-Postava
>Assignee: Apurva Mehta
>  Labels: system-test-failure, system-tests
> Fix For: 0.10.2.0
>
>
> This test is seeing transient failures sometimes
> {quote}
> Module: kafkatest.tests.core.throttling_test
> Class:  ThrottlingTest
> Method: test_throttled_reassignment
> Arguments:
> {
>   "bounce_brokers": false
> }
> {quote}
> This happens with both bounce_brokers = true and false. Fails with
> {quote}
> AssertionError: 1646 acked message did not make it to the Consumer. They are: 
> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19...plus 
> 1626 more. Total Acked: 174799, Total Consumed: 173153. We validated that the 
> first 1000 of these missing messages correctly made it into Kafka's data 
> files. This suggests they were lost on their way to the consumer.
> {quote}
> See 
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-12--001.1481535295--apache--trunk--62e043a/report.html
>  for an example.
> Note that there are a number of similar bug reports for different tests: 
> https://issues.apache.org/jira/issues/?jql=text%20~%20%22acked%20message%20did%20not%20make%20it%20to%20the%20Consumer%22%20and%20project%20%3D%20Kafka
>  I am wondering if we have a wrong ack setting somewhere that we should be 
> specifying as acks=all but is only defaulting to 0?
> It also seems interesting that the missing messages in these recent failures 
> seem to always start at 0...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4560) Min / Max Partitions Fetch Records params

2016-12-19 Thread Stephane Maarek (JIRA)
Stephane Maarek created KAFKA-4560:
--

 Summary: Min / Max Partitions Fetch Records params
 Key: KAFKA-4560
 URL: https://issues.apache.org/jira/browse/KAFKA-4560
 Project: Kafka
  Issue Type: Improvement
  Components: consumer
Affects Versions: 0.10.0.1
Reporter: Stephane Maarek


There is currently a `max.partition.fetch.bytes` parameter to limit the total 
size of the fetch call (also a min).

Sometimes I'd like to control how many records altogether I'm getting at the 
time and I'd like to see a `max.partition.fetch.records` (also a min).

If both are specified the first condition that is met would complete the fetch 
call. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #2275: Fix exception handling in case of file record trun...

2016-12-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2275


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Build failed in Jenkins: kafka-trunk-jdk8 #1113

2016-12-19 Thread Apache Jenkins Server
See 

Changes:

[ismael] KAFKA-4166; Fix transient MM failure caused by slow old consumer

[ismael] KAFKA-4526; Disable throttling test until it can be fixed correctly.

--
[...truncated 7936 lines...]

kafka.integration.SaslPlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAutoCreateTopicWithCollision STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAutoCreateTopicWithCollision PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokerListWithNoTopics STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokerListWithNoTopics PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > testAutoCreateTopic STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > testAutoCreateTopic PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > testGetAllTopicMetadata 
STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > testGetAllTopicMetadata 
PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > testBasicTopicMetadata 
STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > testBasicTopicMetadata PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAutoCreateTopicWithInvalidReplication STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAutoCreateTopicWithInvalidReplication PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown PASSED

kafka.integration.PrimitiveApiTest > testMultiProduce STARTED

kafka.integration.PrimitiveApiTest > testMultiProduce PASSED

kafka.integration.PrimitiveApiTest > testDefaultEncoderProducerAndFetch STARTED

kafka.integration.PrimitiveApiTest > testDefaultEncoderProducerAndFetch PASSED

kafka.integration.PrimitiveApiTest > testFetchRequestCanProperlySerialize 
STARTED

kafka.integration.PrimitiveApiTest > testFetchRequestCanProperlySerialize PASSED

kafka.integration.PrimitiveApiTest > testPipelinedProduceRequests STARTED

kafka.integration.PrimitiveApiTest > testPipelinedProduceRequests PASSED

kafka.integration.PrimitiveApiTest > testProduceAndMultiFetch STARTED

kafka.integration.PrimitiveApiTest > testProduceAndMultiFetch PASSED

kafka.integration.PrimitiveApiTest > 
testDefaultEncoderProducerAndFetchWithCompression STARTED

kafka.integration.PrimitiveApiTest > 
testDefaultEncoderProducerAndFetchWithCompression PASSED

kafka.integration.PrimitiveApiTest > testConsumerEmptyTopic STARTED

kafka.integration.PrimitiveApiTest > testConsumerEmptyTopic PASSED

kafka.integration.PrimitiveApiTest > testEmptyFetchRequest STARTED

kafka.integration.PrimitiveApiTest > testEmptyFetchRequest PASSED

kafka.integration.PlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack STARTED

kafka.integration.PlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack PASSED

kafka.integration.PlaintextTopicMetadataTest > testAutoCreateTopicWithCollision 
STARTED

kafka.integration.PlaintextTopicMetadataTest > testAutoCreateTopicWithCollision 
PASSED

kafka.integration.PlaintextTopicMetadataTest > testAliveBrokerListWithNoTopics 
STARTED

kafka.integration.PlaintextTopicMetadataTest > testAliveBrokerListWithNoTopics 
PASSED

kafka.integration.PlaintextTopicMetadataTest > testAutoCreateTopic STARTED

kafka.integration.PlaintextTopicMetadataTest > testAutoCreateTopic PASSED

kafka.integration.PlaintextTopicMetadataTest > testGetAllTopicMetadata STARTED

kafka.integration.PlaintextTopicMetadataTest > testGetAllTopicMetadata PASSED

kafka.integration.PlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup STARTED

kafka.integration.PlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup PASSED

kafka.integration.PlaintextTopicMetadataTest > testBasicTopicMetadata STARTED

kafka.integration.PlaintextTopicMetadataTest > testBasicTopicMetadata PASSED

kafka.integration.PlaintextTopicMetadataTest > 
testAutoCreateTopicWithInvalidReplication STARTED

kafka.integration.PlaintextTopicMetadataTest > 
testAutoCreateTopicWithInvalidReplication PASSED

kafka.integration.PlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown STARTED

kafka.integration.PlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown PASSED

kafka.integration.AutoOffsetResetTest > testResetToLatestWhenOffsetTooLow 
STARTED

kafka.integration.AutoOffse

[jira] [Commented] (KAFKA-4560) Min / Max Partitions Fetch Records params

2016-12-19 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15762931#comment-15762931
 ] 

huxi commented on KAFKA-4560:
-

There is a consumer config named "max.poll.records" that controls the maximum 
number of records returned in a single call to poll(). Is that what you want?

> Min / Max Partitions Fetch Records params
> -
>
> Key: KAFKA-4560
> URL: https://issues.apache.org/jira/browse/KAFKA-4560
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer
>Affects Versions: 0.10.0.1
>Reporter: Stephane Maarek
>  Labels: features, newbie
>
> There is currently a `max.partition.fetch.bytes` parameter to limit the total 
> size of the fetch call (also a min).
> Sometimes I'd like to control how many records altogether I'm getting at the 
> time and I'd like to see a `max.partition.fetch.records` (also a min).
> If both are specified the first condition that is met would complete the 
> fetch call. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4560) Min / Max Partitions Fetch Records params

2016-12-19 Thread Stephane Maarek (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15762979#comment-15762979
 ] 

Stephane Maarek commented on KAFKA-4560:


Hi [~huxi_2b], this setting has the following description: "The maximum number 
of records returned in a single call to poll()." 

This doesn't affect how many records are returned in the end per partition, 
just affect how many records are affected at each time of the poll call within 
a loop. 
As you see 
https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java#L991
 isn't affected by the parameter max.partition.fetch.bytes which is probably at 
a higher level wrapper call (I can't find it)

> Min / Max Partitions Fetch Records params
> -
>
> Key: KAFKA-4560
> URL: https://issues.apache.org/jira/browse/KAFKA-4560
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer
>Affects Versions: 0.10.0.1
>Reporter: Stephane Maarek
>  Labels: features, newbie
>
> There is currently a `max.partition.fetch.bytes` parameter to limit the total 
> size of the fetch call (also a min).
> Sometimes I'd like to control how many records altogether I'm getting at the 
> time and I'd like to see a `max.partition.fetch.records` (also a min).
> If both are specified the first condition that is met would complete the 
> fetch call. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #2191: KAFKA-4447: Controller resigned but it also acts a...

2016-12-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2191


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-4447) Controller resigned but it also acts as a controller for a long time

2016-12-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15763005#comment-15763005
 ] 

ASF GitHub Bot commented on KAFKA-4447:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2191


> Controller resigned but it also acts as a controller for a long time 
> -
>
> Key: KAFKA-4447
> URL: https://issues.apache.org/jira/browse/KAFKA-4447
> Project: Kafka
>  Issue Type: Improvement
>  Components: controller
>Affects Versions: 0.9.0.0, 0.9.0.1, 0.10.0.0, 0.10.0.1
> Environment: Linux Os
>Reporter: Json Tu
>  Labels: reliability
> Attachments: log.tar.gz
>
>
> We have a cluster with 10 nodes,and we execute following operation as below.
> 1.we execute some topic partition reassign from one node to other 9 nodes in 
> the cluster, and which triggered controller.
> 2.controller invoke PartitionsReassignedListener's handleDataChange and read 
> all partition reassign rules from the zk path, and executed all 
> onPartitionReassignment for all partition that match conditions.
> 3.but the controller is expired from zk, after what some nodes of 9 nodes 
> also expired from zk.
> 5.then controller invoke onControllerResignation to resigned as the 
> controller.
> we found after the controller is resigned, it acts as controller for about 3 
> minutes, which can be found in my attachment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4447) Controller resigned but it also acts as a controller for a long time

2016-12-19 Thread Jiangjie Qin (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiangjie Qin updated KAFKA-4447:

Fix Version/s: 0.10.2.0

> Controller resigned but it also acts as a controller for a long time 
> -
>
> Key: KAFKA-4447
> URL: https://issues.apache.org/jira/browse/KAFKA-4447
> Project: Kafka
>  Issue Type: Improvement
>  Components: controller
>Affects Versions: 0.9.0.0, 0.9.0.1, 0.10.0.0, 0.10.0.1
> Environment: Linux Os
>Reporter: Json Tu
>  Labels: reliability
> Fix For: 0.10.2.0
>
> Attachments: log.tar.gz
>
>
> We have a cluster with 10 nodes,and we execute following operation as below.
> 1.we execute some topic partition reassign from one node to other 9 nodes in 
> the cluster, and which triggered controller.
> 2.controller invoke PartitionsReassignedListener's handleDataChange and read 
> all partition reassign rules from the zk path, and executed all 
> onPartitionReassignment for all partition that match conditions.
> 3.but the controller is expired from zk, after what some nodes of 9 nodes 
> also expired from zk.
> 5.then controller invoke onControllerResignation to resigned as the 
> controller.
> we found after the controller is resigned, it acts as controller for about 3 
> minutes, which can be found in my attachment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4560) Min / Max Partitions Fetch Records params

2016-12-19 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15763027#comment-15763027
 ] 

huxi commented on KAFKA-4560:
-

Not sure if I catch your meaning. You want a config controlling consumer to 
fetch max record number per partition?  Seems no such config is offered. 
cc [~hachikuji]

> Min / Max Partitions Fetch Records params
> -
>
> Key: KAFKA-4560
> URL: https://issues.apache.org/jira/browse/KAFKA-4560
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer
>Affects Versions: 0.10.0.1
>Reporter: Stephane Maarek
>  Labels: features, newbie
>
> There is currently a `max.partition.fetch.bytes` parameter to limit the total 
> size of the fetch call (also a min).
> Sometimes I'd like to control how many records altogether I'm getting at the 
> time and I'd like to see a `max.partition.fetch.records` (also a min).
> If both are specified the first condition that is met would complete the 
> fetch call. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4560) Min / Max Partitions Fetch Records params

2016-12-19 Thread Stephane Maarek (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15763036#comment-15763036
 ] 

Stephane Maarek commented on KAFKA-4560:


Basically if you set the max bytes, each partitions will fetch records up until 
the max bytes is met. 

My ticket is a feature request to offer a similar parameter, named 
max.partitions.fetch.records, which would limit the number of records fetched 
by partitions at every request. 

The requirement isn't addressed by max.poll.records.

Makes sense?

> Min / Max Partitions Fetch Records params
> -
>
> Key: KAFKA-4560
> URL: https://issues.apache.org/jira/browse/KAFKA-4560
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer
>Affects Versions: 0.10.0.1
>Reporter: Stephane Maarek
>  Labels: features, newbie
>
> There is currently a `max.partition.fetch.bytes` parameter to limit the total 
> size of the fetch call (also a min).
> Sometimes I'd like to control how many records altogether I'm getting at the 
> time and I'd like to see a `max.partition.fetch.records` (also a min).
> If both are specified the first condition that is met would complete the 
> fetch call. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: kafka-trunk-jdk7 #1767

2016-12-19 Thread Apache Jenkins Server
See 

Changes:

[ismael] MINOR: Fix exception handling in case of file record truncation during

--
[...truncated 7927 lines...]

kafka.log.LogTest > testReadWithTooSmallMaxLength STARTED

kafka.log.LogTest > testReadWithTooSmallMaxLength PASSED

kafka.log.LogTest > testOverCompactedLogRecovery STARTED

kafka.log.LogTest > testOverCompactedLogRecovery PASSED

kafka.log.LogTest > testBogusIndexSegmentsAreRemoved STARTED

kafka.log.LogTest > testBogusIndexSegmentsAreRemoved PASSED

kafka.log.LogTest > testCompressedMessages STARTED

kafka.log.LogTest > testCompressedMessages PASSED

kafka.log.LogTest > testAppendMessageWithNullPayload STARTED

kafka.log.LogTest > testAppendMessageWithNullPayload PASSED

kafka.log.LogTest > testCorruptLog STARTED

kafka.log.LogTest > testCorruptLog PASSED

kafka.log.LogTest > testLogRecoversToCorrectOffset STARTED

kafka.log.LogTest > testLogRecoversToCorrectOffset PASSED

kafka.log.LogTest > testReopenThenTruncate STARTED

kafka.log.LogTest > testReopenThenTruncate PASSED

kafka.log.LogTest > testParseTopicPartitionNameForMissingPartition STARTED

kafka.log.LogTest > testParseTopicPartitionNameForMissingPartition PASSED

kafka.log.LogTest > testParseTopicPartitionNameForEmptyName STARTED

kafka.log.LogTest > testParseTopicPartitionNameForEmptyName PASSED

kafka.log.LogTest > testOpenDeletesObsoleteFiles STARTED

kafka.log.LogTest > testOpenDeletesObsoleteFiles PASSED

kafka.log.LogTest > testRebuildTimeIndexForOldMessages STARTED

kafka.log.LogTest > testRebuildTimeIndexForOldMessages PASSED

kafka.log.LogTest > testSizeBasedLogRoll STARTED

kafka.log.LogTest > testSizeBasedLogRoll PASSED

kafka.log.LogTest > shouldNotDeleteSizeBasedSegmentsWhenUnderRetentionSize 
STARTED

kafka.log.LogTest > shouldNotDeleteSizeBasedSegmentsWhenUnderRetentionSize 
PASSED

kafka.log.LogTest > testTimeBasedLogRollJitter STARTED

kafka.log.LogTest > testTimeBasedLogRollJitter PASSED

kafka.log.LogTest > testParseTopicPartitionName STARTED

kafka.log.LogTest > testParseTopicPartitionName PASSED

kafka.log.LogTest > testTruncateTo STARTED

kafka.log.LogTest > testTruncateTo PASSED

kafka.log.LogTest > testCleanShutdownFile STARTED

kafka.log.LogTest > testCleanShutdownFile PASSED

kafka.log.LogTest > testBuildTimeIndexWhenNotAssigningOffsets STARTED

kafka.log.LogTest > testBuildTimeIndexWhenNotAssigningOffsets PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[0] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[0] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[1] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[1] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[2] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[2] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[3] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[3] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[4] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[4] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[5] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[5] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[6] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[6] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[7] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[7] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[8] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[8] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[9] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[9] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[10] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[10] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[11] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[11] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[12] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[12] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[13] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[13] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[14] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[14] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[15] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[15] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[16] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[16] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[17] STARTED

kafka.log

Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional Messaging

2016-12-19 Thread radai
1. on write amplification: i dont see x6 the writes, at worst i see x2 the
writes - once to the "tx log", then read and again to the destination
partition. if you have some != 1 replication factor than both the 1st and
the 2nd writes get replicated, but it is still a relative factor of x2.
what am I missing?

2. the effect of write amplification on broker throughput really depends on
the hardware youre running. just as an example - here (
http://www.samsung.com/semiconductor/minisite/ssd/product/consumer/ssd960.html)
is a laptop ssd that can max out a 10gig ethernet NIC on writes. I would
expect that on "high performance" hardware kafka would be CPU bound.

3. why do writes to a TX need the same guarantees as "plain" writes? in
cases where the user can live with a TX rollback on change of
leadership/broker crash the TX log can be unreplicated, and even live in
the leader's memory. that would cut down on writes. this is also an
acceptable default in SQL - if your socket connection to a DB dies mid-TX
your TX is toast (mysql is even worse)

4. even if we replicate the TX log, why do we need to re-read it and
re-write it to the underlying partition? if its already written to disk all
I would need is to make that file the current segment of the "real"
partition and i've avoided the double write (at the cost of complicating
segment management). if the data is replicated fetchers could do the same.

5. on latency - youre right, what im suggesting would result in tx ordering
of messages ,"read committed" semantics and therefore higher latency. its
theoretically possible to implement "read uncommitted" on top of it but it
would inevitably result in opt-in vs vanilla clients seeing a different
order of msgs.

6. the added delay (vs your read uncommitted) would be roughly the time
span of a TX. are you designing for long-running transactions? seems to me
that if the common use case is start a TX, deliver some batch of msgs that
I already have on hand, then commit the delay isnt very long.

7. the need to buffer (or re-read) messages on consumers who do not opt-in
(which are expected to be the majority in your use case?), and do so again
and again if clients reset to earlier offsets/reconsume might make the
system as a whole less efficient.

8. what is the ratio of read-uncommitted vs read-committed clients you
expect to see?

9. what is the ratio of TX writes vs vanilla writes you expect to see?

On Mon, Dec 19, 2016 at 3:15 PM, Guozhang Wang  wrote:

> One more thing about the double journal proposal: when discussing about
> this method back at LinkedIn, another raised issue besides double writing
> was that it will void the offset ordering and enforce people to accept
> "transaction ordering", that is, consumer will not see messages from the
> same partition in the order where they were produced, but only in the order
> of when the corresponding transaction was committed. For some scenarios, we
> believe that offset ordering would still be preferred than transaction
> ordering and that is why in KIP-98 proposal we default to the former while
> leave the door open if users want to switch to the latter case.
>
>
> Guozhang
>
> On Mon, Dec 19, 2016 at 10:56 AM, Jay Kreps  wrote:
>
> > Hey Radai,
> >
> > I'm not sure if I fully understand what you are proposing, but I
> > interpreted it to be similar to a proposal we worked through back at
> > LinkedIn. The proposal was to commit to a central txlog topic, and then
> > recopy to the destination topic upon transaction commit. The observation
> on
> > that approach at the time were the following:
> >
> >1. It is cleaner since the output topics have only committed data!
> >2. You need full replication on the txlog topic to ensure atomicity.
> We
> >weren't able to come up with a solution where you buffer in memory or
> > use
> >renaming tricks the way you are describing. The reason is that once
> you
> >begin committing you must ensure that the commit eventually succeeds
> to
> >guarantee atomicity. If you use a transient store you might commit
> some
> >data and then have a server failure that causes you to lose the rest
> of
> > the
> >transaction.
> >3. Having a single log allows the reader to choose a "read
> uncommitted"
> >mode that hands out messages immediately. This is important for cases
> > where
> >latency is important, especially for stream processing topologies
> where
> >these latencies stack up across multiple stages.
> >
> > For the stream processing use case, item (2) is a bit of a deal killer.
> > This takes the cost of a transient message write (say the intermediate
> > result of a stream processing topology) from 3x writes (assuming 3x
> > replication) to 6x writes. This means you basically can't default it on.
> If
> > we can in fact get the cost down to a single buffered write (i.e. 1x the
> > data is written to memory and buffered to disk if the transaction is
> large)
> > as in the KIP-98 proposal wi

Re: [VOTE] 0.10.1.1 RC1

2016-12-19 Thread Vahid S Hashemian
Hi Guozhang,

I also verified the quickstart on Ubuntu and Mac. +1 on those.

On Windows OS there are a couple of issues for which the following PRs 
exist:
- https://github.com/apache/kafka/pull/2146 (already merged to trunk)
- https://github.com/apache/kafka/pull/2238 (open)

These issues are not specific to this RC. So they can be included in a 
future release.

Thanks again for running the release.

Regards.
--Vahid




From:   Jun Rao 
To: "us...@kafka.apache.org" , 
"dev@kafka.apache.org" 
Date:   12/19/2016 02:47 PM
Subject:Re: [VOTE] 0.10.1.1 RC1



Hi, Guozhang,

Thanks for preparing the release. Verified quickstart. +1

Jun

On Thu, Dec 15, 2016 at 1:29 PM, Guozhang Wang  wrote:

> Hello Kafka users, developers and client-developers,
>
> This is the second, and hopefully the last candidate for the release of
> Apache Kafka 0.10.1.1 before the break. This is a bug fix release and it
> includes fixes and improvements from 30 JIRAs. See the release notes for
> more details:
>
> http://home.apache.org/~guozhang/kafka-0.10.1.1-rc1/RELEASE_NOTES.html
>
> *** Please download, test and vote by Tuesday, 20 December, 8pm PT ***
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
> http://kafka.apache.org/KEYS
>
> * Release artifacts to be voted upon (source and binary):
> http://home.apache.org/~guozhang/kafka-0.10.1.1-rc1/
>
> * Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging/org/apache/kafka/
>
> NOTE the artifacts include the ones built from Scala 2.12.1 and Java8,
> which are treated a pre-alpha artifacts for the Scala community to try 
and
> test it out:
>
> https://repository.apache.org/content/groups/staging/org/
> apache/kafka/kafka_2.12/0.10.1.1/
>
> We will formally add the scala 2.12 support in future minor releases.
>
>
> * Javadoc:
> http://home.apache.org/~guozhang/kafka-0.10.1.1-rc1/javadoc/
>
> * Tag to be voted upon (off 0.10.0 branch) is the 0.10.0.1-rc0 tag:
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=
> c3638376708ee6c02dfe4e57747acae0126fa6e7
>
>
> Thanks,
> Guozhang
>
> --
> -- Guozhang
>






Re: [VOTE] KIP-87 - Add Compaction Tombstone Flag

2016-12-19 Thread radai
this kip fixes a "bug" (quirk?) that arises when people implement headers
"in V" (in the payload part of a message).
if you have proper headers you obviously dont need to to stick them in V
and so wont run into this, but its still a valid issue.

On Mon, Dec 19, 2016 at 3:06 PM, Jay Kreps  wrote:

> Makes sense!
>
> -Jay
>
> On Mon, Dec 19, 2016 at 2:40 PM, Michael Pearce 
> wrote:
>
> > Wow just read that def over tired. Hopefully it makes sense. Or you get
> > the gist at least.
> >
> > 
> > From: Michael Pearce 
> > Sent: Monday, December 19, 2016 9:19:02 PM
> > To: dev@kafka.apache.org
> > Subject: Re: [VOTE] KIP-87 - Add Compaction Tombstone Flag
> >
> > Hi Jay,
> >
> > Agreed this stemmed as offshoot from KIP-82.
> >
> > Which our main driver for was to be able to have some headers for a null
> > value as such for our routing, audit, tracing and a few other bits which
> > currently we are forced to do with a message wrapper, if we all agreed on
> > KIP-82 that we need native headers and look to implement that the push
> for
> > this would dissipate.
> >
> > This KIP would allow for though one use case that comes to mind we could
> > see which is to have business data with a delete. Though as said this
> isn't
> > something we are pushing for think really we would have.
> >
> > As such in summary yes, if you want to fully support KIP-82 and we can
> get
> > that agreed in principle and a target release version, I think quite a
> few
> > guys at LinkedIn are quite pro it too ;) I'm happy to drop this one.
> >
> > Cheers
> > Mike
> >
> > Sent using OWA for iPhone
> > 
> > From: Jay Kreps 
> > Sent: Monday, December 19, 2016 8:51:23 PM
> > To: dev@kafka.apache.org
> > Subject: Re: [VOTE] KIP-87 - Add Compaction Tombstone Flag
> >
> > Hey Michael,
> >
> > Here is the compatibility concern I have:
> >
> >1. You have a consumer app that relies on value == null to indicate a
> >delete (current semantics).
> >2. You upgrade Kafka and your clients.
> >3. Some producer starts using the tombstone field in combination with
> >non-null.
> >
> > I share Ismael's dislike of setting tombstones on records with null
> values.
> > This makes sense as a transitional state, but as an end state its a bit
> > weird. You'd expect to be able to mix null values and tombstones, and
> have
> > the null values remain and the tombstones get compacted. However what
> will
> > happen is both will be compacted and upon debugging this you'll learn
> that
> > we sometimes use null in the value to indicate tombstone. Ismael's
> solution
> > is a bigger compatibility break, though, so not sure if that is better.
> >
> > My other question is the relationship to KIP-82. My read is that this KIP
> > solves some but not all of the problems KIP-82 is intended for. KIP-82,
> on
> > the other hand, seems to address most of the motivating uses for this
> KIP.
> > The exception is maybe item (5) on the list where you want to
> simultaneous
> > delete and convey some information to subscribers, but I couldn't
> construct
> > a concrete examples for that one. Do we need to rationalize these two
> KIPs?
> > That is, do you still advocate this proposal if we do KIP-82 and vice
> > versa? As you may have noticed I'm somewhat emotionally invested in the
> > simplicity of the core data model, so my default position is let's try to
> > avoid stuffing more stuff in, but if we have to add stuff I like each of
> > these individually more than doing both. :-)
> >
> > -Jay
> >
> >
> >
> >
> > On Fri, Dec 16, 2016 at 12:16 PM, Michael Pearce 
> > wrote:
> >
> > > Hi Jay
> > >
> > > I disagree here that we are breaking any compatibility, we went through
> > > this on the discussion thread in fact with the help of that thread is
> how
> > > the kip came to the solution.
> > >
> > > Also on the supported combinations front you mention, we are not taking
> > > anything away afaik.
> > >
> > > Currently supported are only are:
> > > Null value = delete
> > > Non-null value = non delete
> > >
> > > With this kip we would support
> > > Null value + tombstone = delete
> > > Non null value + tombstone = delete
> > > Non null value + no tombstone = non delete
> > >
> > > As for the alternative idea, this is simply a new policy, how can there
> > be
> > > confusion here? For this policy it would be explicit that tombstone
> > marker
> > > would need to be set for a delete.
> > >
> > > I'm going to vent a little now as starting to get quite frustrated.
> > >
> > > We are going round in circles on kip-82 as per use cases there is now
> > many
> > > use cases, how many more are needed? just because confluent don't see
> > these
> > > doesn't mean they aren't real use cases other have, this is the point
> of
> > > the Apache foundation, it shouldn't be the view of just one
> organisation.
> > > It really is getting a feeling of the NIH syndrome. Rather than it
> being
> > > co

Re: [VOTE] KIP-87 - Add Compaction Tombstone Flag

2016-12-19 Thread radai
i didnt mean to sound as insisting. what i actually mean is it would still
be a valid issue but of much less concern.

On Mon, Dec 19, 2016 at 7:50 PM, radai  wrote:

> this kip fixes a "bug" (quirk?) that arises when people implement headers
> "in V" (in the payload part of a message).
> if you have proper headers you obviously dont need to to stick them in V
> and so wont run into this, but its still a valid issue.
>
> On Mon, Dec 19, 2016 at 3:06 PM, Jay Kreps  wrote:
>
>> Makes sense!
>>
>> -Jay
>>
>> On Mon, Dec 19, 2016 at 2:40 PM, Michael Pearce 
>> wrote:
>>
>> > Wow just read that def over tired. Hopefully it makes sense. Or you get
>> > the gist at least.
>> >
>> > 
>> > From: Michael Pearce 
>> > Sent: Monday, December 19, 2016 9:19:02 PM
>> > To: dev@kafka.apache.org
>> > Subject: Re: [VOTE] KIP-87 - Add Compaction Tombstone Flag
>> >
>> > Hi Jay,
>> >
>> > Agreed this stemmed as offshoot from KIP-82.
>> >
>> > Which our main driver for was to be able to have some headers for a null
>> > value as such for our routing, audit, tracing and a few other bits which
>> > currently we are forced to do with a message wrapper, if we all agreed
>> on
>> > KIP-82 that we need native headers and look to implement that the push
>> for
>> > this would dissipate.
>> >
>> > This KIP would allow for though one use case that comes to mind we could
>> > see which is to have business data with a delete. Though as said this
>> isn't
>> > something we are pushing for think really we would have.
>> >
>> > As such in summary yes, if you want to fully support KIP-82 and we can
>> get
>> > that agreed in principle and a target release version, I think quite a
>> few
>> > guys at LinkedIn are quite pro it too ;) I'm happy to drop this one.
>> >
>> > Cheers
>> > Mike
>> >
>> > Sent using OWA for iPhone
>> > 
>> > From: Jay Kreps 
>> > Sent: Monday, December 19, 2016 8:51:23 PM
>> > To: dev@kafka.apache.org
>> > Subject: Re: [VOTE] KIP-87 - Add Compaction Tombstone Flag
>> >
>> > Hey Michael,
>> >
>> > Here is the compatibility concern I have:
>> >
>> >1. You have a consumer app that relies on value == null to indicate a
>> >delete (current semantics).
>> >2. You upgrade Kafka and your clients.
>> >3. Some producer starts using the tombstone field in combination with
>> >non-null.
>> >
>> > I share Ismael's dislike of setting tombstones on records with null
>> values.
>> > This makes sense as a transitional state, but as an end state its a bit
>> > weird. You'd expect to be able to mix null values and tombstones, and
>> have
>> > the null values remain and the tombstones get compacted. However what
>> will
>> > happen is both will be compacted and upon debugging this you'll learn
>> that
>> > we sometimes use null in the value to indicate tombstone. Ismael's
>> solution
>> > is a bigger compatibility break, though, so not sure if that is better.
>> >
>> > My other question is the relationship to KIP-82. My read is that this
>> KIP
>> > solves some but not all of the problems KIP-82 is intended for. KIP-82,
>> on
>> > the other hand, seems to address most of the motivating uses for this
>> KIP.
>> > The exception is maybe item (5) on the list where you want to
>> simultaneous
>> > delete and convey some information to subscribers, but I couldn't
>> construct
>> > a concrete examples for that one. Do we need to rationalize these two
>> KIPs?
>> > That is, do you still advocate this proposal if we do KIP-82 and vice
>> > versa? As you may have noticed I'm somewhat emotionally invested in the
>> > simplicity of the core data model, so my default position is let's try
>> to
>> > avoid stuffing more stuff in, but if we have to add stuff I like each of
>> > these individually more than doing both. :-)
>> >
>> > -Jay
>> >
>> >
>> >
>> >
>> > On Fri, Dec 16, 2016 at 12:16 PM, Michael Pearce > >
>> > wrote:
>> >
>> > > Hi Jay
>> > >
>> > > I disagree here that we are breaking any compatibility, we went
>> through
>> > > this on the discussion thread in fact with the help of that thread is
>> how
>> > > the kip came to the solution.
>> > >
>> > > Also on the supported combinations front you mention, we are not
>> taking
>> > > anything away afaik.
>> > >
>> > > Currently supported are only are:
>> > > Null value = delete
>> > > Non-null value = non delete
>> > >
>> > > With this kip we would support
>> > > Null value + tombstone = delete
>> > > Non null value + tombstone = delete
>> > > Non null value + no tombstone = non delete
>> > >
>> > > As for the alternative idea, this is simply a new policy, how can
>> there
>> > be
>> > > confusion here? For this policy it would be explicit that tombstone
>> > marker
>> > > would need to be set for a delete.
>> > >
>> > > I'm going to vent a little now as starting to get quite frustrated.
>> > >
>> > > We are going round in circles on kip-82 as per use cases there is now
>> > many
>> > > use ca

Jenkins build is back to normal : kafka-trunk-jdk8 #1114

2016-12-19 Thread Apache Jenkins Server
See 



Jenkins build is back to normal : kafka-trunk-jdk7 #1768

2016-12-19 Thread Apache Jenkins Server
See 



[jira] [Resolved] (KAFKA-4447) Controller resigned but it also acts as a controller for a long time

2016-12-19 Thread Json Tu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Json Tu resolved KAFKA-4447.

Resolution: Fixed

> Controller resigned but it also acts as a controller for a long time 
> -
>
> Key: KAFKA-4447
> URL: https://issues.apache.org/jira/browse/KAFKA-4447
> Project: Kafka
>  Issue Type: Improvement
>  Components: controller
>Affects Versions: 0.9.0.0, 0.9.0.1, 0.10.0.0, 0.10.0.1
> Environment: Linux Os
>Reporter: Json Tu
>  Labels: reliability
> Fix For: 0.10.2.0
>
> Attachments: log.tar.gz
>
>
> We have a cluster with 10 nodes,and we execute following operation as below.
> 1.we execute some topic partition reassign from one node to other 9 nodes in 
> the cluster, and which triggered controller.
> 2.controller invoke PartitionsReassignedListener's handleDataChange and read 
> all partition reassign rules from the zk path, and executed all 
> onPartitionReassignment for all partition that match conditions.
> 3.but the controller is expired from zk, after what some nodes of 9 nodes 
> also expired from zk.
> 5.then controller invoke onControllerResignation to resigned as the 
> controller.
> we found after the controller is resigned, it acts as controller for about 3 
> minutes, which can be found in my attachment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [VOTE] KIP-87 - Add Compaction Tombstone Flag

2016-12-19 Thread Michael Pearce
Agreed. As I said there is still a use case just I wouldn't be pushing for it, 
the. Need for it reduces for me.

Sent using OWA for iPhone

From: radai 
Sent: Tuesday, December 20, 2016 3:51:33 AM
To: dev@kafka.apache.org
Subject: Re: [VOTE] KIP-87 - Add Compaction Tombstone Flag

i didnt mean to sound as insisting. what i actually mean is it would still
be a valid issue but of much less concern.

On Mon, Dec 19, 2016 at 7:50 PM, radai  wrote:

> this kip fixes a "bug" (quirk?) that arises when people implement headers
> "in V" (in the payload part of a message).
> if you have proper headers you obviously dont need to to stick them in V
> and so wont run into this, but its still a valid issue.
>
> On Mon, Dec 19, 2016 at 3:06 PM, Jay Kreps  wrote:
>
>> Makes sense!
>>
>> -Jay
>>
>> On Mon, Dec 19, 2016 at 2:40 PM, Michael Pearce 
>> wrote:
>>
>> > Wow just read that def over tired. Hopefully it makes sense. Or you get
>> > the gist at least.
>> >
>> > 
>> > From: Michael Pearce 
>> > Sent: Monday, December 19, 2016 9:19:02 PM
>> > To: dev@kafka.apache.org
>> > Subject: Re: [VOTE] KIP-87 - Add Compaction Tombstone Flag
>> >
>> > Hi Jay,
>> >
>> > Agreed this stemmed as offshoot from KIP-82.
>> >
>> > Which our main driver for was to be able to have some headers for a null
>> > value as such for our routing, audit, tracing and a few other bits which
>> > currently we are forced to do with a message wrapper, if we all agreed
>> on
>> > KIP-82 that we need native headers and look to implement that the push
>> for
>> > this would dissipate.
>> >
>> > This KIP would allow for though one use case that comes to mind we could
>> > see which is to have business data with a delete. Though as said this
>> isn't
>> > something we are pushing for think really we would have.
>> >
>> > As such in summary yes, if you want to fully support KIP-82 and we can
>> get
>> > that agreed in principle and a target release version, I think quite a
>> few
>> > guys at LinkedIn are quite pro it too ;) I'm happy to drop this one.
>> >
>> > Cheers
>> > Mike
>> >
>> > Sent using OWA for iPhone
>> > 
>> > From: Jay Kreps 
>> > Sent: Monday, December 19, 2016 8:51:23 PM
>> > To: dev@kafka.apache.org
>> > Subject: Re: [VOTE] KIP-87 - Add Compaction Tombstone Flag
>> >
>> > Hey Michael,
>> >
>> > Here is the compatibility concern I have:
>> >
>> >1. You have a consumer app that relies on value == null to indicate a
>> >delete (current semantics).
>> >2. You upgrade Kafka and your clients.
>> >3. Some producer starts using the tombstone field in combination with
>> >non-null.
>> >
>> > I share Ismael's dislike of setting tombstones on records with null
>> values.
>> > This makes sense as a transitional state, but as an end state its a bit
>> > weird. You'd expect to be able to mix null values and tombstones, and
>> have
>> > the null values remain and the tombstones get compacted. However what
>> will
>> > happen is both will be compacted and upon debugging this you'll learn
>> that
>> > we sometimes use null in the value to indicate tombstone. Ismael's
>> solution
>> > is a bigger compatibility break, though, so not sure if that is better.
>> >
>> > My other question is the relationship to KIP-82. My read is that this
>> KIP
>> > solves some but not all of the problems KIP-82 is intended for. KIP-82,
>> on
>> > the other hand, seems to address most of the motivating uses for this
>> KIP.
>> > The exception is maybe item (5) on the list where you want to
>> simultaneous
>> > delete and convey some information to subscribers, but I couldn't
>> construct
>> > a concrete examples for that one. Do we need to rationalize these two
>> KIPs?
>> > That is, do you still advocate this proposal if we do KIP-82 and vice
>> > versa? As you may have noticed I'm somewhat emotionally invested in the
>> > simplicity of the core data model, so my default position is let's try
>> to
>> > avoid stuffing more stuff in, but if we have to add stuff I like each of
>> > these individually more than doing both. :-)
>> >
>> > -Jay
>> >
>> >
>> >
>> >
>> > On Fri, Dec 16, 2016 at 12:16 PM, Michael Pearce > >
>> > wrote:
>> >
>> > > Hi Jay
>> > >
>> > > I disagree here that we are breaking any compatibility, we went
>> through
>> > > this on the discussion thread in fact with the help of that thread is
>> how
>> > > the kip came to the solution.
>> > >
>> > > Also on the supported combinations front you mention, we are not
>> taking
>> > > anything away afaik.
>> > >
>> > > Currently supported are only are:
>> > > Null value = delete
>> > > Non-null value = non delete
>> > >
>> > > With this kip we would support
>> > > Null value + tombstone = delete
>> > > Non null value + tombstone = delete
>> > > Non null value + no tombstone = non delete
>> > >
>> > > As for the alternative idea, this is simply a new policy, how can
>> there
>>