Re: [VOTE] #2 KIP-248: Create New ConfigCommand That Uses The New AdminClient

2018-04-03 Thread Peter Toth
+1 (non binding)

Thanks Viktor.
Please remove "Unknown" AlterOperation type from Wire Format Types section
as you did from the AdminClient APIs section.


On Wed, Mar 21, 2018 at 5:41 PM, Viktor Somogyi 
wrote:

> Hi Everyone,
>
> I've started a vote on KIP-248
>  ConfigCommand+That+Uses+The+New+AdminClient#KIP-248-
> CreateNewConfigCommandThatUsesTheNewAdminClient-DescribeQuotas>
> a few weeks ago but at the time I got a couple more comments and it was
> very close to 1.1 feature freeze, people were occupied with that, so I
> wanted to restart the vote on this.
>
>
> *Summary of the KIP*
> For those who don't have context I thought I'd summarize it in a few
> sentence.
> *Problem & Motivation: *The basic problem that the KIP tries to solve is
> that kafka-configs.sh (which in turn uses the ConfigCommand class) uses a
> direct zookeeper connection. This is not desirable as getting around the
> broker opens up security issues and prevents the tool from being used in
> deployments where only the brokers are exposed to clients. Also a somewhat
> smaller motivation is to rewrite the tool in java as part of the tools
> component so we can get rid of requiring the core module on the classpath
> for the kafka-configs tool.
> *Solution:*
> - I've designed new 2 protocols: DescribeQuotas and AlterQuotas.
> - Also redesigned the output format of the command line tool so it provides
> a nicer result.
> - kafka-configs.[sh/bat] will use a new java based ConfigCommand that is
> placed in tools.
>
>
> I'd be happy to receive any votes or feedback on this.
>
> Regards,
> Viktor
>


Re: [VOTE] KIP-249: Add Delegation Token Operations to Kafka Admin Client

2018-04-03 Thread Rajini Sivaram
+1 (binding)

Thanks for the KIP, Manikumar!

On Thu, Mar 29, 2018 at 5:34 PM, Gwen Shapira  wrote:

> +1
>
> Thank you and sorry for missing it the first time around.
>
> On Thu, Mar 29, 2018 at 3:05 AM, Manikumar 
> wrote:
>
> > I'm bumping this up to get some attention.
> >
> >
> > On Wed, Jan 24, 2018 at 3:36 PM, Satish Duggana <
> satish.dugg...@gmail.com>
> > wrote:
> >
> > >  +1, thanks for the KIP.
> > >
> > > ~Satish.
> > >
> > > On Wed, Jan 24, 2018 at 5:09 AM, Jun Rao  wrote:
> > >
> > > > Hi, Mani,
> > > >
> > > > Thanks for the KIP. +1
> > > >
> > > > Jun
> > > >
> > > > On Sun, Jan 21, 2018 at 7:44 AM, Manikumar <
> manikumar.re...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > I would like to start a vote on KIP-249 which would add delegation
> > > token
> > > > > operations
> > > > > to Java Admin Client.
> > > > >
> > > > > We have merged DelegationToken API PR recently. We want to include
> > > admin
> > > > > client changes in the upcoming release. This will make the feature
> > > > > complete.
> > > > >
> > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > 249%3A+Add+Delegation+Token+Operations+to+KafkaAdminClient
> > > > >
> > > > > Thanks,
> > > > >
> > > >
> > >
> >
>
>
>
> --
> *Gwen Shapira*
> Product Manager | Confluent
> 650.450.2760 | @gwenshap
> Follow us: Twitter  | blog
> 
>


Re: [VOTE] #2 KIP-248: Create New ConfigCommand That Uses The New AdminClient

2018-04-03 Thread Rajini Sivaram
+1 (binding)

Thanks for the KIP, Viktor!

On Tue, Apr 3, 2018 at 8:25 AM, Peter Toth  wrote:

> +1 (non binding)
>
> Thanks Viktor.
> Please remove "Unknown" AlterOperation type from Wire Format Types section
> as you did from the AdminClient APIs section.
>
>
> On Wed, Mar 21, 2018 at 5:41 PM, Viktor Somogyi 
> wrote:
>
> > Hi Everyone,
> >
> > I've started a vote on KIP-248
> >  > ConfigCommand+That+Uses+The+New+AdminClient#KIP-248-
> > CreateNewConfigCommandThatUsesTheNewAdminClient-DescribeQuotas>
> > a few weeks ago but at the time I got a couple more comments and it was
> > very close to 1.1 feature freeze, people were occupied with that, so I
> > wanted to restart the vote on this.
> >
> >
> > *Summary of the KIP*
> > For those who don't have context I thought I'd summarize it in a few
> > sentence.
> > *Problem & Motivation: *The basic problem that the KIP tries to solve is
> > that kafka-configs.sh (which in turn uses the ConfigCommand class) uses a
> > direct zookeeper connection. This is not desirable as getting around the
> > broker opens up security issues and prevents the tool from being used in
> > deployments where only the brokers are exposed to clients. Also a
> somewhat
> > smaller motivation is to rewrite the tool in java as part of the tools
> > component so we can get rid of requiring the core module on the classpath
> > for the kafka-configs tool.
> > *Solution:*
> > - I've designed new 2 protocols: DescribeQuotas and AlterQuotas.
> > - Also redesigned the output format of the command line tool so it
> provides
> > a nicer result.
> > - kafka-configs.[sh/bat] will use a new java based ConfigCommand that is
> > placed in tools.
> >
> >
> > I'd be happy to receive any votes or feedback on this.
> >
> > Regards,
> > Viktor
> >
>


[jira] [Created] (KAFKA-6741) Transient test failure: SslTransportLayerTest.testNetworkThreadTimeRecorded

2018-04-03 Thread Manikumar (JIRA)
Manikumar created KAFKA-6741:


 Summary: Transient test failure: 
SslTransportLayerTest.testNetworkThreadTimeRecorded
 Key: KAFKA-6741
 URL: https://issues.apache.org/jira/browse/KAFKA-6741
 Project: Kafka
  Issue Type: Bug
Reporter: Manikumar


debug logs:

{code}
 [2018-04-03 14:51:33,365] DEBUG Added sensor with name connections-closed: 
(org.apache.kafka.common.metrics.Metrics:414)
 [2018-04-03 14:51:33,368] DEBUG Added sensor with name connections-created: 
(org.apache.kafka.common.metrics.Metrics:414)
 [2018-04-03 14:51:33,368] DEBUG Added sensor with name 
successful-authentication: (org.apache.kafka.common.metrics.Metrics:414)
 [2018-04-03 14:51:33,368] DEBUG Added sensor with name failed-authentication: 
(org.apache.kafka.common.metrics.Metrics:414)
 [2018-04-03 14:51:33,368] DEBUG Added sensor with name bytes-sent-received: 
(org.apache.kafka.common.metrics.Metrics:414)
 [2018-04-03 14:51:33,369] DEBUG Added sensor with name bytes-sent: 
(org.apache.kafka.common.metrics.Metrics:414)
 [2018-04-03 14:51:33,370] DEBUG Added sensor with name bytes-received: 
(org.apache.kafka.common.metrics.Metrics:414)
 [2018-04-03 14:51:33,370] DEBUG Added sensor with name select-time: 
(org.apache.kafka.common.metrics.Metrics:414)
 [2018-04-03 14:51:33,371] DEBUG Added sensor with name io-time: 
(org.apache.kafka.common.metrics.Metrics:414)
 [2018-04-03 14:51:33,379] DEBUG Removed sensor with name connections-closed: 
(org.apache.kafka.common.metrics.Metrics:447)
 [2018-04-03 14:51:33,379] DEBUG Removed sensor with name connections-created: 
(org.apache.kafka.common.metrics.Metrics:447)
 [2018-04-03 14:51:33,379] DEBUG Removed sensor with name 
successful-authentication: (org.apache.kafka.common.metrics.Metrics:447)
 [2018-04-03 14:51:33,380] DEBUG Removed sensor with name 
failed-authentication: (org.apache.kafka.common.metrics.Metrics:447)
 [2018-04-03 14:51:33,380] DEBUG Removed sensor with name bytes-sent-received: 
(org.apache.kafka.common.metrics.Metrics:447)
 [2018-04-03 14:51:33,380] DEBUG Removed sensor with name bytes-sent: 
(org.apache.kafka.common.metrics.Metrics:447)
 [2018-04-03 14:51:33,380] DEBUG Removed sensor with name bytes-received: 
(org.apache.kafka.common.metrics.Metrics:447)
 [2018-04-03 14:51:33,382] DEBUG Removed sensor with name select-time: 
(org.apache.kafka.common.metrics.Metrics:447)
 [2018-04-03 14:51:33,382] DEBUG Removed sensor with name io-time: 
(org.apache.kafka.common.metrics.Metrics:447)
 [2018-04-03 14:51:33,382] DEBUG Added sensor with name connections-closed: 
(org.apache.kafka.common.metrics.Metrics:414)
 [2018-04-03 14:51:33,382] DEBUG Added sensor with name connections-created: 
(org.apache.kafka.common.metrics.Metrics:414)
 [2018-04-03 14:51:33,383] DEBUG Added sensor with name 
successful-authentication: (org.apache.kafka.common.metrics.Metrics:414)
 [2018-04-03 14:51:33,383] DEBUG Added sensor with name failed-authentication: 
(org.apache.kafka.common.metrics.Metrics:414)
 [2018-04-03 14:51:33,383] DEBUG Added sensor with name bytes-sent-received: 
(org.apache.kafka.common.metrics.Metrics:414)
 [2018-04-03 14:51:33,384] DEBUG Added sensor with name bytes-sent: 
(org.apache.kafka.common.metrics.Metrics:414)
 [2018-04-03 14:51:33,384] DEBUG Added sensor with name bytes-received: 
(org.apache.kafka.common.metrics.Metrics:414)
 [2018-04-03 14:51:33,384] DEBUG Added sensor with name select-time: 
(org.apache.kafka.common.metrics.Metrics:414)
 [2018-04-03 14:51:33,384] DEBUG Added sensor with name io-time: 
(org.apache.kafka.common.metrics.Metrics:414)
 [2018-04-03 14:51:33,444] DEBUG Added sensor with name connections-closed: 
(org.apache.kafka.common.metrics.Metrics:414)
 [2018-04-03 14:51:33,445] DEBUG Added sensor with name connections-created: 
(org.apache.kafka.common.metrics.Metrics:414)
 [2018-04-03 14:51:33,445] DEBUG Added sensor with name 
successful-authentication: (org.apache.kafka.common.metrics.Metrics:414)
 [2018-04-03 14:51:33,445] DEBUG Added sensor with name failed-authentication: 
(org.apache.kafka.common.metrics.Metrics:414)
 [2018-04-03 14:51:33,446] DEBUG Added sensor with name bytes-sent-received: 
(org.apache.kafka.common.metrics.Metrics:414)
 [2018-04-03 14:51:33,446] DEBUG Added sensor with name bytes-sent: 
(org.apache.kafka.common.metrics.Metrics:414)
 [2018-04-03 14:51:33,446] DEBUG Added sensor with name bytes-received: 
(org.apache.kafka.common.metrics.Metrics:414)
 [2018-04-03 14:51:33,447] DEBUG Added sensor with name select-time: 
(org.apache.kafka.common.metrics.Metrics:414)
 [2018-04-03 14:51:33,447] DEBUG Added sensor with name io-time: 
(org.apache.kafka.common.metrics.Metrics:414)
 [2018-04-03 14:51:33,890] DEBUG Created socket with SO_RCVBUF = 326640, 
SO_SNDBUF = 65328, SO_TIMEOUT = 0 to node 0 
(org.apache.kafka.common.network.Selector:474)
 [2018-04-03 14:51:33,892] DEBUG Added sensor with name 
node-127.0.0.1:53543-127.0.

[jira] [Created] (KAFKA-6742) TopologyTestDriver error when dealing with stores from GlobalKTable

2018-04-03 Thread Valentino Proietti (JIRA)
Valentino Proietti created KAFKA-6742:
-

 Summary: TopologyTestDriver error when dealing with stores from 
GlobalKTable
 Key: KAFKA-6742
 URL: https://issues.apache.org/jira/browse/KAFKA-6742
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 1.1.0
Reporter: Valentino Proietti


{color:#FF}This junit test simply fails:{color}

@Test

*public* *void* globalTable() {

StreamsBuilder builder = *new* StreamsBuilder();

@SuppressWarnings("unused")

*final* KTable localTable = builder

.table("local", 

Consumed._with_(Serdes._String_(), Serdes._String_()),

Materialized._as_("localStore"))

;

@SuppressWarnings("unused")

*final* GlobalKTable globalTable = builder

.globalTable("global", 

Consumed._with_(Serdes._String_(), Serdes._String_()),

        Materialized._as_("globalStore"))

;

//

Properties props = *new* Properties();

props.setProperty(StreamsConfig.*_APPLICATION_ID_CONFIG_*, "test");

props.setProperty(StreamsConfig.*_BOOTSTRAP_SERVERS_CONFIG_*, "localhost");

TopologyTestDriver testDriver = *new* TopologyTestDriver(builder.build(), 
props);

//

*final* KeyValueStore localStore = 
testDriver.getKeyValueStore("localStore");

Assert._assertNotNull_(localStore);

Assert._assertNotNull_(testDriver.getAllStateStores().get("localStore"));

//

*final* KeyValueStore globalStore = 
testDriver.getKeyValueStore("globalStore");

Assert._assertNotNull_(globalStore);

Assert._assertNotNull_(testDriver.getAllStateStores().get("globalStore"));

//

    *final* ConsumerRecordFactory crf = *new* 
ConsumerRecordFactory<>(*new* StringSerializer(), *new* StringSerializer());

testDriver.pipeInput(crf.create("local", "one", "TheOne"));

testDriver.pipeInput(crf.create("global", "one", "TheOne"));

//

Assert._assertEquals_("TheOne", localStore.get("one"));

Assert._assertEquals_("TheOne", globalStore.get("one"));

 

 

{color:#FF}to make it work I had to modify the TopologyTestDriver class as 
follow:{color}

...

    *public* Map getAllStateStores() {

//        final Map allStores = new HashMap<>();

//        for (final String storeName : 
internalTopologyBuilder.allStateStoreName()) {

//            allStores.put(storeName, ((ProcessorContextImpl) 
task.context()).getStateMgr().getStore(storeName));

//        }

//        return allStores;

    {color:#FF}// *FIXME*{color}

    *final* ProcessorStateManager psm = ((ProcessorContextImpl) 
task.context()).getStateMgr();

        *final* Map allStores = *new* HashMap<>();

        *for* (*final* String storeName : 
internalTopologyBuilder.allStateStoreName()) {

            StateStore res = psm.getStore(storeName);

            *if* (res == *null*)

            res = psm.getGlobalStore(storeName);

            allStores.put(storeName, res);

        }

        *return* allStores;

    }

...

    *public* StateStore getStateStore(*final* String name) {

//        return ((ProcessorContextImpl) 
task.context()).getStateMgr().getStore(name);

        {color:#FF}// *FIXME*{color}

    *final* ProcessorStateManager psm = ((ProcessorContextImpl) 
task.context()).getStateMgr();

        StateStore res = psm.getStore(name);

        *if* (res == *null*)

        res = psm.getGlobalStore(name);

        *return* res;

    }

 

{color:#FF}moreover I think it would be very useful to make the internal 
MockProducer public for testing cases where a producer is used along side with 
the "normal" stream processing by adding the method:{color}

    /**

     * *@return* records sent with this producer are automatically streamed to 
the topology.

     */

    *public* *final* Producer<*byte*[], *byte*[]> getProducer() {

    *return* producer;

    }

 

{color:#FF}unfortunately this introduces another problem that could be 
verified by adding the following lines to the previous junit test:{color}

...

**

//

ConsumerRecord<*byte*[],*byte*[]> cr = crf.create("dummy", "two", "Second"); // 
just to serialize keys and values

testDriver.getProducer().send(*new* ProducerRecord<>("local", *null*, 
cr.timestamp(), cr.key(), cr.value()));

testDriver.getProducer().send(*new* ProducerRecord<>("global", *null*, 
cr.timestamp(), cr.key(), cr.value()));

testDriver.advanceWallClockTime(0);

Assert._assertEquals_("TheOne", localStore.get("one"));

Assert._assertEquals_("Second", localStore.get("two"));

Assert._assertEquals_("TheOne", globalStore.get("one"));

Assert._assertEquals_("Second", globalStore.get("two"));

}

 

{color:#FF}that could be fixed with:{color}

 

    *private* *void* captureOutputRecords() {

        // Capture all the records sent to the producer ...

        *final* List> output = 
producer.history();

        producer.clear();

        *for* (*final* ProducerRecord<*byte*[], *byte*[]> record : output) {

            Queue> outputRecords = 
outputRecordsByTopic.get(record.topic());


Re: [VOTE] #2 KIP-248: Create New ConfigCommand That Uses The New AdminClient

2018-04-03 Thread Attila Sasvári
Thanks for working on it Viktor.

It looks good to me, but I have some questions:
- I see a new type DOUBLE is used for quota_value , and it is not listed
among the primitive types on the Kafka protocol guide. Can you add some
more details?
- I am not sure that using an environment (i.e. USE_OLD_COMMAND)variable is
the best way to control behaviour of kafka-config.sh . In other scripts
(e.g. console-consumer) an argument is passed (e.g. --new-consumer). If we
still want to use it, then I would suggest something like
USE_OLD_KAFKA_CONFIG_COMMAND. What do you think?
- I have seen maps like Map, Collection>.
If List is the key type, you should make sure that this
List is immutable. Have you considered to introduce a new wrapper class?

Regards,
- Attila

On Thu, Mar 29, 2018 at 1:46 PM, zhenya Sun  wrote:
>
>
>
> +1 (non-binding)
>
>
> | |
> zhenya Sun
> 邮箱:toke...@126.com
> |
>
> 签名由 网易邮箱大师 定制
>
> On 03/29/2018 19:40, Sandor Murakozi wrote:
> +1 (non-binding)
>
> Thanks for the KIP, Viktor
>
> On Wed, Mar 21, 2018 at 5:41 PM, Viktor Somogyi 
> wrote:
>
> > Hi Everyone,
> >
> > I've started a vote on KIP-248
> >  > ConfigCommand+That+Uses+The+New+AdminClient#KIP-248-
> > CreateNewConfigCommandThatUsesTheNewAdminClient-DescribeQuotas>
> > a few weeks ago but at the time I got a couple more comments and it was
> > very close to 1.1 feature freeze, people were occupied with that, so I
> > wanted to restart the vote on this.
> >
> >
> > *Summary of the KIP*
> > For those who don't have context I thought I'd summarize it in a few
> > sentence.
> > *Problem & Motivation: *The basic problem that the KIP tries to solve is
> > that kafka-configs.sh (which in turn uses the ConfigCommand class) uses
a
> > direct zookeeper connection. This is not desirable as getting around the
> > broker opens up security issues and prevents the tool from being used in
> > deployments where only the brokers are exposed to clients. Also a
somewhat
> > smaller motivation is to rewrite the tool in java as part of the tools
> > component so we can get rid of requiring the core module on the
classpath
> > for the kafka-configs tool.
> > *Solution:*
> > - I've designed new 2 protocols: DescribeQuotas and AlterQuotas.
> > - Also redesigned the output format of the command line tool so it
provides
> > a nicer result.
> > - kafka-configs.[sh/bat] will use a new java based ConfigCommand that is
> > placed in tools.
> >
> >
> > I'd be happy to receive any votes or feedback on this.
> >
> > Regards,
> > Viktor
> >


Re: [DISCUSS] KIP-275 - Indicate "isClosing" in the SinkTaskContext

2018-04-03 Thread Matt Farmer
I have made the requested updates to the KIP! :)

On Mon, Apr 2, 2018 at 11:02 AM, Matt Farmer  wrote:

> Ugh
>
> * I can update
>
> I need more caffeine...
>
> On Mon, Apr 2, 2018 at 11:01 AM, Matt Farmer  wrote:
>
>> I'm can update the rejected alternatives section as you describe.
>>
>> Also, adding a paragraph to the preCommit javadoc also seems like a
>> Very Very Good Idea™ so I'll make that update to the KIP as well.
>>
>> On Mon, Apr 2, 2018 at 10:48 AM, Randall Hauch  wrote:
>>
>>> Thanks for the KIP proposal, Matt.
>>>
>>> You mention in the "Rejected Alternatives" section that you considered
>>> changing the signature of the `preCommit` method but rejected it because
>>> it
>>> would break backward compatibility. Strictly speaking, it is possible to
>>> do
>>> this without breaking compatibility by introducing a new `preCommit`
>>> method, deprecating the old one, and having the new implementation call
>>> the
>>> old one. Such an approach would be complicated, and I'm not sure it adds
>>> any value. In fact, one of the benefits of having a context object is
>>> that
>>> we can add methods like the one you're proposing without causing any
>>> compatibility issues. Anyway, it probably is worth updating this rejected
>>> alternative to be a bit more precise.
>>>
>>> Otherwise, I think this is a good approach, though I'd request that we
>>> update the `preCommit` JavaDoc to add a paragraph that explains this
>>> scenario. Thoughts?
>>>
>>> Randall
>>>
>>> On Wed, Mar 28, 2018 at 9:29 PM, Ted Yu  wrote:
>>>
>>> > I looked at WorkerSinkTask and it seems using a boolean for KIP-275
>>> should
>>> > suffice for now.
>>> >
>>> > Thanks
>>> >
>>> > On Wed, Mar 28, 2018 at 7:20 PM, Matt Farmer  wrote:
>>> >
>>> > > Hey Ted,
>>> > >
>>> > > I have not, actually!
>>> > >
>>> > > Do you think that we're likely to add multiple states here soon?
>>> > >
>>> > > My instinct is to keep it simple until there are multiple states
>>> that we
>>> > > would want
>>> > > to consider. I really like the simplicity of just getting a boolean
>>> and
>>> > the
>>> > > implementation of WorkerSinkTask already passes around a boolean to
>>> > > indicate this is happening internally. We're really just shuttling
>>> that
>>> > > value into
>>> > > the context at the correct moments.
>>> > >
>>> > > Once we have multiple states, we could choose to provide a more
>>> > > appropriately
>>> > > named method (e.g. getState?) and reimplement isClosing by checking
>>> that
>>> > > enum
>>> > > without breaking compatibility.
>>> > >
>>> > > However, if we think multiple states here are imminent for some
>>> reason, I
>>> > > would
>>> > > be pretty easy to convince adding that would be worth the extra
>>> > complexity!
>>> > > :)
>>> > >
>>> > > Matt
>>> > >
>>> > > —
>>> > > Matt Farmer | Blog  | Twitter
>>> > > 
>>> > > GPG: CD57 2E26 F60C 0A61 E6D8  FC72 4493 8917 D667 4D07
>>> > >
>>> > > On Wed, Mar 28, 2018 at 10:02 PM, Ted Yu 
>>> wrote:
>>> > >
>>> > > > The enhancement gives SinkTaskContext state information.
>>> > > >
>>> > > > Have you thought of exposing the state retrieval as an enum
>>> (initially
>>> > > with
>>> > > > two values) ?
>>> > > >
>>> > > > Thanks
>>> > > >
>>> > > > On Wed, Mar 28, 2018 at 6:55 PM, Matt Farmer  wrote:
>>> > > >
>>> > > > > Hello all,
>>> > > > >
>>> > > > > I am proposing KIP-275 to improve Connect's SinkTaskContext so
>>> that
>>> > > Sinks
>>> > > > > can be informed
>>> > > > > in their preCommit hook if the hook is being invoked as a part
>>> of a
>>> > > > > rebalance or Connect
>>> > > > > shutdown.
>>> > > > >
>>> > > > > The KIP is here:
>>> > > > > https://cwiki.apache.org/confluence/pages/viewpage.
>>> > > > action?pageId=75977607
>>> > > > >
>>> > > > > Please let me know what feedback y'all have. Thanks!
>>> > > > >
>>> > > >
>>> > >
>>> >
>>>
>>
>>
>


[jira] [Created] (KAFKA-6743) ConsumerPerformance fails to consume all messages on topics with large number of partitions

2018-04-03 Thread Alex Dunayevsky (JIRA)
Alex Dunayevsky created KAFKA-6743:
--

 Summary: ConsumerPerformance fails to consume all messages on 
topics with large number of partitions
 Key: KAFKA-6743
 URL: https://issues.apache.org/jira/browse/KAFKA-6743
 Project: Kafka
  Issue Type: Bug
  Components: core, tools
Affects Versions: 0.11.0.2
Reporter: Alex Dunayevsky


ConsumerPerformance fails to consume all messages on topics with large number 
of partitions due to a relatively short default polling loop timeout (1000 ms) 
that is not reachable and modifiable by the end user.

Demo: Create a topic of 10 000 partitions, send a 50 000 000 of 100 byte 
records using kafka-producer-perf-test and consume them using 
kafka-consumer-perf-test (ConsumerPerformance). You will likely notice that the 
number of records returned by the kafka-consumer-perf-test is many times less 
than expected 50 000 000. This happens due to specific ConsumerPerformance 
implementation. As the result, in some rough cases it may take a long enough 
time to process/iterate through the records polled in batches, thus, the time 
may exceed the default hardcoded polling loop timeout and this is probably not 
what we want from this utility.

We have two options: 
1) Increasing polling loop timeout in ConsumerPerformance implementation. It 
defaults to 1000 ms and is hardcoded, thus cannot be changed but we could 
export it as an OPTIONAL kafka-consumer-perf-test parameter to enable it on a 
script level configuration and available to the end user.
2) Decreasing max.poll.records on a Consumer config level. This is not a fine 
option though since we do not want to touch the default settings.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] KIP-253: Support in-order message delivery with partition expansion

2018-04-03 Thread Dong Lin
Hey John,

Thanks much for your comments!!

I have yet to go through the emails of John/Jun/Guozhang in detail. But let
me present my idea for how to minimize the delay for state loading for
stream use-case.

For ease of understanding, let's assume that the initial partition number
of input topics and change log topic are both 10. And initial number of
stream processor is also 10. If we only increase initial partition number
of input topics to 15 without changing number of stream processor, the
current KIP already guarantees in-order delivery and no state needs to be
moved between consumers for stream use-case. Next, let's say we want to
increase the number of processor to expand the processing capacity for
stream use-case. This requires us to move state between processors which
will take time. Our goal is to minimize the impact (i.e. delay) for
processing while we increase the number of processors.

Note that stream processor generally includes both consumer and producer.
In addition to consume from the input topic, consumer may also need to
consume from change log topic on startup for recovery. And producer may
produce state to the change log topic.

The solution will include the following steps:

1) Increase partition number of the input topic from 10 to 15. Since the
messages with the same key will still go to the same consume before and
after the partition expansion, this step can be done without having to move
state between processors.

2) Increase partition number of the change log topic from 10 to 15. Note
that this step can also be done without impacting existing workflow. After
we increase partition number of the change log topic, key space may split
and some key will be produced to the newly-added partition. But the same
key will still go to the same processor (i.e. consumer) before and after
the partition. Thus this step can also be done without having to move state
between processors.

3) Now, let's add 5 new consumers whose groupId is different from the
existing processor's groupId. Thus these new consumers will not impact
existing workflow. Each of these new consumers should consume two
partitions from the earliest offset, where these two partitions are the
same partitions that will be consumed if the consumers have the same
groupId as the existing processor's groupId. For example, the first of the
five consumers will consume partition 0 and partition 10. The purpose of
these consumers is to rebuild the state (e.g. RocksDB) for the processors
in advance. Also note that, by design of the current KIP, each consume will
consume the existing partition of the change log topic up to the offset
before the partition expansion. Then they will only need to consume the
state of the new partition of the change log topic.

4) After consumers have caught up in step 3), we should stop these
consumers and add 5 new processors to the stream processing job. These 5
new processors should run in the same location as the previous 5 consumers
to re-use the state (e.g. RocksDB). And these processors' consumers should
consume partitions of the change log topic from the committed offset the
previous 5 consumers so that no state is missed.

One important trick to note here is that, the mapping from partition to
consumer should also use linear hashing. And we need to remember the
initial number of processors in the job, 10 in this example, and use this
number in the linear hashing algorithm. This is pretty much the same as how
we use linear hashing to map key to partition. In this case, we get an
identity map from partition -> processor, for both input topic and the
change log topic. For example, processor 12 will consume partition 12 of
the input topic and produce state to the partition 12 of the change log
topic.

There are a few important properties of this solution to note:

- We can increase the number of partitions for input topic and the change
log topic in any order asynchronously.
- The expansion of the processors in a given job in step 4) only requires
the step 3) for the same job. It does not require coordination across
different jobs for step 3) and 4). Thus different jobs can independently
expand there capacity without waiting for each other.
- The logic for 1) and 2) is already supported in the current KIP. The
logic for 3) and 4) appears to be independent of the core Kafka logic and
can be implemented separately outside core Kafka. Thus the current KIP is
probably sufficient after we agree on the efficiency and the correctness of
the solution. We can have a separate KIP for Kafka Stream to support 3) and
4).


Cheers,
Dong


On Mon, Apr 2, 2018 at 3:25 PM, Guozhang Wang  wrote:

> Hey guys, just sharing my two cents here (I promise it will be shorter than
> John's article :).
>
> 0. Just to quickly recap, the main discussion point now is how to support
> "key partitioning preservation" (John's #4 in topic characteristics above)
> beyond the "single-key ordering preservation" that KIP-253 was originally

[jira] [Resolved] (KAFKA-6728) Kafka Connect Header Null Pointer Exception

2018-04-03 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava resolved KAFKA-6728.
--
   Resolution: Fixed
Fix Version/s: 1.1.1
   1.2.0

Issue resolved by pull request 4815
[https://github.com/apache/kafka/pull/4815]

> Kafka Connect Header Null Pointer Exception
> ---
>
> Key: KAFKA-6728
> URL: https://issues.apache.org/jira/browse/KAFKA-6728
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 1.1.0
> Environment: Linux Mint
>Reporter: Philippe Hong
>Priority: Critical
> Fix For: 1.2.0, 1.1.1
>
>
> I am trying to use the newly released Kafka Connect that supports headers by 
> using the standalone connector to write to a text file (so in this case I am 
> only using the sink component)
> I am sadly greeted by a NullPointerException :
> {noformat}
> ERROR WorkerSinkTask{id=local-file-sink-0} Task threw an uncaught and 
> unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask:172)
> java.lang.NullPointerException
>     at 
> org.apache.kafka.connect.runtime.WorkerSinkTask.convertHeadersFor(WorkerSinkTask.java:501)
>     at 
> org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:469)
>     at 
> org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:301)
>     at 
> org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:205)
>     at 
> org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:173)
>     at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:170)
>     at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:214)
>     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748)
> {noformat}
> I launched zookeeper and kafka 1.1.0 locally and sent a 
> ProducerRecord[String, Array[Byte]] using a KafkaProducer[String, 
> Array[Byte]] with a header that has a key and value.
> I can read the record with a console consumer as well as using a 
> KafkaConsumer (where in this case I can see the content of the header of the 
> record I sent previously) so no problem here.
> I only made two changes to the kafka configuration:
>      - I used the StringConverter for the key and the ByteArrayConverter for 
> the value. 
>      - I also changed the topic where the sink would connect to.
> If I forgot something please tell me so as it is the first time I am creating 
> an issue on Jira.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Kafka Authorizer interface review

2018-04-03 Thread Mickael Maison
Hi all,

Over the past few months the IBM Message Hub team has "played quite a
bit" with the pluggable Authorizer interface and I'll try to give a
summary of our findings.

First when implementing a custom Authorizer, we found it hard having a
global view of all the Resource/Operation required for each ApiKey. We
ended up building a table (by looking at KafkaApis.scala) of all the
combinations that can be triggered. We posted this table in the wiki,
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Authorizations,
hopefully that will help others too.

We found the overview it provides necessary and it should probably be
in the docs/javadocs.

The biggest limitation for us were the permissions required to create
topics. This is what we targeted with KIP-277:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-277+-+Fine+Grained+ACL+for+CreateTopics+API

Some of our other findings:
- There is now way to distinguish between topic and record deletion.
If a Principal has Delete on a Topic, it can do both. With regulations
like GDPR, we can expect the DeleteRecords API to gain popularity and
it's a bit scary that it also allows to delete the topic.
- We also can't distinguish between DescribeLogDirs, DescribeAcls and
ListGroups as they both require Describe on the Cluster resource.
While ListGroups is pretty common for "normal" users, the other 2 are
a bit more on the admin side.
- OffsetCommit only requires Read on Group even though it's
technically a write operation. I think this was already discussed at
some point on the mailing list.

Changing permissions is an expensive process and so far we've not
attempted to come up with alternatives (apart from KIP-277). There is
also a balance between granularity and ease of use, requiring
administrators to set and maintain many permissions is not really an
improvement!

Thanks


Re: Kafka Authorizer interface review

2018-04-03 Thread Ted Yu
bq. There is now way to distinguish between topic and record deletion.

I guess you meant 'no way' above.
I think deleting a topic has higher impact than deleting records.

Have you considered filing KIP to distinguish the two operations ?

Cheers

On Tue, Apr 3, 2018 at 9:22 AM, Mickael Maison 
wrote:

> Hi all,
>
> Over the past few months the IBM Message Hub team has "played quite a
> bit" with the pluggable Authorizer interface and I'll try to give a
> summary of our findings.
>
> First when implementing a custom Authorizer, we found it hard having a
> global view of all the Resource/Operation required for each ApiKey. We
> ended up building a table (by looking at KafkaApis.scala) of all the
> combinations that can be triggered. We posted this table in the wiki,
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Authorizations,
> hopefully that will help others too.
>
> We found the overview it provides necessary and it should probably be
> in the docs/javadocs.
>
> The biggest limitation for us were the permissions required to create
> topics. This is what we targeted with KIP-277:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 277+-+Fine+Grained+ACL+for+CreateTopics+API
>
> Some of our other findings:
> - There is now way to distinguish between topic and record deletion.
> If a Principal has Delete on a Topic, it can do both. With regulations
> like GDPR, we can expect the DeleteRecords API to gain popularity and
> it's a bit scary that it also allows to delete the topic.
> - We also can't distinguish between DescribeLogDirs, DescribeAcls and
> ListGroups as they both require Describe on the Cluster resource.
> While ListGroups is pretty common for "normal" users, the other 2 are
> a bit more on the admin side.
> - OffsetCommit only requires Read on Group even though it's
> technically a write operation. I think this was already discussed at
> some point on the mailing list.
>
> Changing permissions is an expensive process and so far we've not
> attempted to come up with alternatives (apart from KIP-277). There is
> also a balance between granularity and ease of use, requiring
> administrators to set and maintain many permissions is not really an
> improvement!
>
> Thanks
>


Re: [DISCUSS] KIP-275 - Indicate "isClosing" in the SinkTaskContext

2018-04-03 Thread Randall Hauch
Matt,

Let me play devil's advocate. Do we need this additional complexity? The
motivation section talked about needing to deal with task failures due to
connectivity problems. Generally speaking, isn't it possible that if one
task has connectivity problems with either the broker or the external
system that other tasks would as well? And in the particular case of S3, is
it possible to try and prevent the task shutdown in the first place,
perhaps by improving how the S3 connector retries? (We did this in the
Elasticsearch connector using backoff with jitter; see
https://github.com/confluentinc/kafka-connect-elasticsearch/pull/116 for
details.)

Best regards,

Randall

On Tue, Apr 3, 2018 at 8:39 AM, Matt Farmer  wrote:

> I have made the requested updates to the KIP! :)
>
> On Mon, Apr 2, 2018 at 11:02 AM, Matt Farmer  wrote:
>
> > Ugh
> >
> > * I can update
> >
> > I need more caffeine...
> >
> > On Mon, Apr 2, 2018 at 11:01 AM, Matt Farmer  wrote:
> >
> >> I'm can update the rejected alternatives section as you describe.
> >>
> >> Also, adding a paragraph to the preCommit javadoc also seems like a
> >> Very Very Good Idea™ so I'll make that update to the KIP as well.
> >>
> >> On Mon, Apr 2, 2018 at 10:48 AM, Randall Hauch 
> wrote:
> >>
> >>> Thanks for the KIP proposal, Matt.
> >>>
> >>> You mention in the "Rejected Alternatives" section that you considered
> >>> changing the signature of the `preCommit` method but rejected it
> because
> >>> it
> >>> would break backward compatibility. Strictly speaking, it is possible
> to
> >>> do
> >>> this without breaking compatibility by introducing a new `preCommit`
> >>> method, deprecating the old one, and having the new implementation call
> >>> the
> >>> old one. Such an approach would be complicated, and I'm not sure it
> adds
> >>> any value. In fact, one of the benefits of having a context object is
> >>> that
> >>> we can add methods like the one you're proposing without causing any
> >>> compatibility issues. Anyway, it probably is worth updating this
> rejected
> >>> alternative to be a bit more precise.
> >>>
> >>> Otherwise, I think this is a good approach, though I'd request that we
> >>> update the `preCommit` JavaDoc to add a paragraph that explains this
> >>> scenario. Thoughts?
> >>>
> >>> Randall
> >>>
> >>> On Wed, Mar 28, 2018 at 9:29 PM, Ted Yu  wrote:
> >>>
> >>> > I looked at WorkerSinkTask and it seems using a boolean for KIP-275
> >>> should
> >>> > suffice for now.
> >>> >
> >>> > Thanks
> >>> >
> >>> > On Wed, Mar 28, 2018 at 7:20 PM, Matt Farmer  wrote:
> >>> >
> >>> > > Hey Ted,
> >>> > >
> >>> > > I have not, actually!
> >>> > >
> >>> > > Do you think that we're likely to add multiple states here soon?
> >>> > >
> >>> > > My instinct is to keep it simple until there are multiple states
> >>> that we
> >>> > > would want
> >>> > > to consider. I really like the simplicity of just getting a boolean
> >>> and
> >>> > the
> >>> > > implementation of WorkerSinkTask already passes around a boolean to
> >>> > > indicate this is happening internally. We're really just shuttling
> >>> that
> >>> > > value into
> >>> > > the context at the correct moments.
> >>> > >
> >>> > > Once we have multiple states, we could choose to provide a more
> >>> > > appropriately
> >>> > > named method (e.g. getState?) and reimplement isClosing by checking
> >>> that
> >>> > > enum
> >>> > > without breaking compatibility.
> >>> > >
> >>> > > However, if we think multiple states here are imminent for some
> >>> reason, I
> >>> > > would
> >>> > > be pretty easy to convince adding that would be worth the extra
> >>> > complexity!
> >>> > > :)
> >>> > >
> >>> > > Matt
> >>> > >
> >>> > > —
> >>> > > Matt Farmer | Blog  | Twitter
> >>> > > 
> >>> > > GPG: CD57 2E26 F60C 0A61 E6D8  FC72 4493 8917 D667 4D07
> >>> > >
> >>> > > On Wed, Mar 28, 2018 at 10:02 PM, Ted Yu 
> >>> wrote:
> >>> > >
> >>> > > > The enhancement gives SinkTaskContext state information.
> >>> > > >
> >>> > > > Have you thought of exposing the state retrieval as an enum
> >>> (initially
> >>> > > with
> >>> > > > two values) ?
> >>> > > >
> >>> > > > Thanks
> >>> > > >
> >>> > > > On Wed, Mar 28, 2018 at 6:55 PM, Matt Farmer 
> wrote:
> >>> > > >
> >>> > > > > Hello all,
> >>> > > > >
> >>> > > > > I am proposing KIP-275 to improve Connect's SinkTaskContext so
> >>> that
> >>> > > Sinks
> >>> > > > > can be informed
> >>> > > > > in their preCommit hook if the hook is being invoked as a part
> >>> of a
> >>> > > > > rebalance or Connect
> >>> > > > > shutdown.
> >>> > > > >
> >>> > > > > The KIP is here:
> >>> > > > > https://cwiki.apache.org/confluence/pages/viewpage.
> >>> > > > action?pageId=75977607
> >>> > > > >
> >>> > > > > Please let me know what feedback y'all have. Thanks!
> >>> > > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> >>
> >>
> >
>


Re: Kafka Authorizer interface review

2018-04-03 Thread Mickael Maison
Yes this is indeed a typo!

And yes we're considering filing another KIP but I thought collecting
all our feedback and providing a full summary might be beneficial for
others.
I see you too are concerned about the current delete record/topic limitation.

On Tue, Apr 3, 2018 at 5:26 PM, Ted Yu  wrote:
> bq. There is now way to distinguish between topic and record deletion.
>
> I guess you meant 'no way' above.
> I think deleting a topic has higher impact than deleting records.
>
> Have you considered filing KIP to distinguish the two operations ?
>
> Cheers
>
> On Tue, Apr 3, 2018 at 9:22 AM, Mickael Maison 
> wrote:
>
>> Hi all,
>>
>> Over the past few months the IBM Message Hub team has "played quite a
>> bit" with the pluggable Authorizer interface and I'll try to give a
>> summary of our findings.
>>
>> First when implementing a custom Authorizer, we found it hard having a
>> global view of all the Resource/Operation required for each ApiKey. We
>> ended up building a table (by looking at KafkaApis.scala) of all the
>> combinations that can be triggered. We posted this table in the wiki,
>> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Authorizations,
>> hopefully that will help others too.
>>
>> We found the overview it provides necessary and it should probably be
>> in the docs/javadocs.
>>
>> The biggest limitation for us were the permissions required to create
>> topics. This is what we targeted with KIP-277:
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> 277+-+Fine+Grained+ACL+for+CreateTopics+API
>>
>> Some of our other findings:
>> - There is now way to distinguish between topic and record deletion.
>> If a Principal has Delete on a Topic, it can do both. With regulations
>> like GDPR, we can expect the DeleteRecords API to gain popularity and
>> it's a bit scary that it also allows to delete the topic.
>> - We also can't distinguish between DescribeLogDirs, DescribeAcls and
>> ListGroups as they both require Describe on the Cluster resource.
>> While ListGroups is pretty common for "normal" users, the other 2 are
>> a bit more on the admin side.
>> - OffsetCommit only requires Read on Group even though it's
>> technically a write operation. I think this was already discussed at
>> some point on the mailing list.
>>
>> Changing permissions is an expensive process and so far we've not
>> attempted to come up with alternatives (apart from KIP-277). There is
>> also a balance between granularity and ease of use, requiring
>> administrators to set and maintain many permissions is not really an
>> improvement!
>>
>> Thanks
>>


Build failed in Jenkins: kafka-trunk-jdk8 #2521

2018-04-03 Thread Apache Jenkins Server
See 


Changes:

[rajinisivaram] MINOR: Add Timed wait to

--
[...truncated 422.76 KB...]
kafka.zookeeper.ZooKeeperClientTest > testGetChildrenExistingZNodeWithChildren 
STARTED

kafka.zookeeper.ZooKeeperClientTest > testGetChildrenExistingZNodeWithChildren 
PASSED

kafka.zookeeper.ZooKeeperClientTest > testSetDataExistingZNode STARTED

kafka.zookeeper.ZooKeeperClientTest > testSetDataExistingZNode PASSED

kafka.zookeeper.ZooKeeperClientTest > testMixedPipeline STARTED

kafka.zookeeper.ZooKeeperClientTest > testMixedPipeline PASSED

kafka.zookeeper.ZooKeeperClientTest > testGetDataExistingZNode STARTED

kafka.zookeeper.ZooKeeperClientTest > testGetDataExistingZNode PASSED

kafka.zookeeper.ZooKeeperClientTest > testDeleteExistingZNode STARTED

kafka.zookeeper.ZooKeeperClientTest > testDeleteExistingZNode PASSED

kafka.zookeeper.ZooKeeperClientTest > testSessionExpiry STARTED

kafka.zookeeper.ZooKeeperClientTest > testSessionExpiry PASSED

kafka.zookeeper.ZooKeeperClientTest > testSetDataNonExistentZNode STARTED

kafka.zookeeper.ZooKeeperClientTest > testSetDataNonExistentZNode PASSED

kafka.zookeeper.ZooKeeperClientTest > testDeleteNonExistentZNode STARTED

kafka.zookeeper.ZooKeeperClientTest > testDeleteNonExistentZNode PASSED

kafka.zookeeper.ZooKeeperClientTest > testExistsExistingZNode STARTED

kafka.zookeeper.ZooKeeperClientTest > testExistsExistingZNode PASSED

kafka.zookeeper.ZooKeeperClientTest > testZooKeeperStateChangeRateMetrics 
STARTED

kafka.zookeeper.ZooKeeperClientTest > testZooKeeperStateChangeRateMetrics PASSED

kafka.zookeeper.ZooKeeperClientTest > testZNodeChangeHandlerForDeletion STARTED

kafka.zookeeper.ZooKeeperClientTest > testZNodeChangeHandlerForDeletion PASSED

kafka.zookeeper.ZooKeeperClientTest > testGetAclNonExistentZNode STARTED

kafka.zookeeper.ZooKeeperClientTest > testGetAclNonExistentZNode PASSED

kafka.zookeeper.ZooKeeperClientTest > testStateChangeHandlerForAuthFailure 
STARTED

kafka.zookeeper.ZooKeeperClientTest > testStateChangeHandlerForAuthFailure 
PASSED

kafka.network.SocketServerTest > testGracefulClose STARTED

kafka.network.SocketServerTest > testGracefulClose PASSED

kafka.network.SocketServerTest > controlThrowable STARTED

kafka.network.SocketServerTest > controlThrowable PASSED

kafka.network.SocketServerTest > testRequestMetricsAfterStop STARTED

kafka.network.SocketServerTest > testRequestMetricsAfterStop PASSED

kafka.network.SocketServerTest > testConnectionIdReuse STARTED

kafka.network.SocketServerTest > testConnectionIdReuse PASSED

kafka.network.SocketServerTest > testClientDisconnectionUpdatesRequestMetrics 
STARTED

kafka.network.SocketServerTest > testClientDisconnectionUpdatesRequestMetrics 
PASSED

kafka.network.SocketServerTest > testProcessorMetricsTags STARTED

kafka.network.SocketServerTest > testProcessorMetricsTags PASSED

kafka.network.SocketServerTest > testMaxConnectionsPerIp STARTED

kafka.network.SocketServerTest > testMaxConnectionsPerIp PASSED

kafka.network.SocketServerTest > testConnectionId STARTED

kafka.network.SocketServerTest > testConnectionId PASSED

kafka.network.SocketServerTest > 
testBrokerSendAfterChannelClosedUpdatesRequestMetrics STARTED

kafka.network.SocketServerTest > 
testBrokerSendAfterChannelClosedUpdatesRequestMetrics PASSED

kafka.network.SocketServerTest > testNoOpAction STARTED

kafka.network.SocketServerTest > testNoOpAction PASSED

kafka.network.SocketServerTest > simpleRequest STARTED

kafka.network.SocketServerTest > simpleRequest PASSED

kafka.network.SocketServerTest > closingChannelException STARTED

kafka.network.SocketServerTest > closingChannelException PASSED

kafka.network.SocketServerTest > testIdleConnection STARTED

kafka.network.SocketServerTest > testIdleConnection PASSED

kafka.network.SocketServerTest > 
testClientDisconnectionWithStagedReceivesFullyProcessed STARTED

kafka.network.SocketServerTest > 
testClientDisconnectionWithStagedReceivesFullyProcessed PASSED

kafka.network.SocketServerTest > testZeroMaxConnectionsPerIp STARTED

kafka.network.SocketServerTest > testZeroMaxConnectionsPerIp PASSED

kafka.network.SocketServerTest > testMetricCollectionAfterShutdown STARTED

kafka.network.SocketServerTest > testMetricCollectionAfterShutdown PASSED

kafka.network.SocketServerTest > testSessionPrincipal STARTED

kafka.network.SocketServerTest > testSessionPrincipal PASSED

kafka.network.SocketServerTest > configureNewConnectionException STARTED

kafka.network.SocketServerTest > configureNewConnectionException PASSED

kafka.network.SocketServerTest > testMaxConnectionsPerIpOverrides STARTED

kafka.network.SocketServerTest > testMaxConnectionsPerIpOverrides PASSED

kafka.network.SocketServerTest > processNewResponseException STARTED

kafka.network.SocketServerTest > processNewResponseException PASSED

kafka.network.SocketServerTest > processCompletedSendException STARTED

Re: Kafka Authorizer interface review

2018-04-03 Thread Ted Yu
bq. you too are concerned about the current delete record/topic limitation

Yes.
I think this is a security hole.

On Tue, Apr 3, 2018 at 9:37 AM, Mickael Maison 
wrote:

> Yes this is indeed a typo!
>
> And yes we're considering filing another KIP but I thought collecting
> all our feedback and providing a full summary might be beneficial for
> others.
> I see you too are concerned about the current delete record/topic
> limitation.
>
> On Tue, Apr 3, 2018 at 5:26 PM, Ted Yu  wrote:
> > bq. There is now way to distinguish between topic and record deletion.
> >
> > I guess you meant 'no way' above.
> > I think deleting a topic has higher impact than deleting records.
> >
> > Have you considered filing KIP to distinguish the two operations ?
> >
> > Cheers
> >
> > On Tue, Apr 3, 2018 at 9:22 AM, Mickael Maison  >
> > wrote:
> >
> >> Hi all,
> >>
> >> Over the past few months the IBM Message Hub team has "played quite a
> >> bit" with the pluggable Authorizer interface and I'll try to give a
> >> summary of our findings.
> >>
> >> First when implementing a custom Authorizer, we found it hard having a
> >> global view of all the Resource/Operation required for each ApiKey. We
> >> ended up building a table (by looking at KafkaApis.scala) of all the
> >> combinations that can be triggered. We posted this table in the wiki,
> >> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Authorizations,
> >> hopefully that will help others too.
> >>
> >> We found the overview it provides necessary and it should probably be
> >> in the docs/javadocs.
> >>
> >> The biggest limitation for us were the permissions required to create
> >> topics. This is what we targeted with KIP-277:
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >> 277+-+Fine+Grained+ACL+for+CreateTopics+API
> >>
> >> Some of our other findings:
> >> - There is now way to distinguish between topic and record deletion.
> >> If a Principal has Delete on a Topic, it can do both. With regulations
> >> like GDPR, we can expect the DeleteRecords API to gain popularity and
> >> it's a bit scary that it also allows to delete the topic.
> >> - We also can't distinguish between DescribeLogDirs, DescribeAcls and
> >> ListGroups as they both require Describe on the Cluster resource.
> >> While ListGroups is pretty common for "normal" users, the other 2 are
> >> a bit more on the admin side.
> >> - OffsetCommit only requires Read on Group even though it's
> >> technically a write operation. I think this was already discussed at
> >> some point on the mailing list.
> >>
> >> Changing permissions is an expensive process and so far we've not
> >> attempted to come up with alternatives (apart from KIP-277). There is
> >> also a balance between granularity and ease of use, requiring
> >> administrators to set and maintain many permissions is not really an
> >> improvement!
> >>
> >> Thanks
> >>
>


Re: Kafka Authorizer interface review

2018-04-03 Thread Vahid S Hashemian
Hi Mickael,

Thanks for detailed description on these authorization issues.
I agree they need to be reviewed and fixed in the areas you specified, or 
even at a higher that simplifies their maintenance as the matrix is 
expanded or needs to be modified.
FYI, KIP-231 also attempts at addressing the issue with ListGroups API in 
a backward compatible way.

--Vahid




From:   Mickael Maison 
To: dev 
Date:   04/03/2018 09:22 AM
Subject:Kafka Authorizer interface review



Hi all,

Over the past few months the IBM Message Hub team has "played quite a
bit" with the pluggable Authorizer interface and I'll try to give a
summary of our findings.

First when implementing a custom Authorizer, we found it hard having a
global view of all the Resource/Operation required for each ApiKey. We
ended up building a table (by looking at KafkaApis.scala) of all the
combinations that can be triggered. We posted this table in the wiki,
https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_KAFKA_Kafka-2BAuthorizations&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=Q_itwloTQj3_xUKl7Nzswo6KE4Nj-kjJc7uSVcviKUc&m=OsQVJInvqT_yR7tSsG5DRrkZg56iCzthkvSJiiLoF8k&s=z-seYP7KLIJFfpEX1yR0ficIYpXGIAiLOTb3gqaOa0k&e=
,
hopefully that will help others too.

We found the overview it provides necessary and it should probably be
in the docs/javadocs.

The biggest limitation for us were the permissions required to create
topics. This is what we targeted with KIP-277:
https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_KAFKA_KIP-2D277-2B-2D-2BFine-2BGrained-2BACL-2Bfor-2BCreateTopics-2BAPI&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=Q_itwloTQj3_xUKl7Nzswo6KE4Nj-kjJc7uSVcviKUc&m=OsQVJInvqT_yR7tSsG5DRrkZg56iCzthkvSJiiLoF8k&s=IQ2jGoMdgsjG4wkRaL5dV19m3eYMo4e9joyjani2pd8&e=


Some of our other findings:
- There is now way to distinguish between topic and record deletion.
If a Principal has Delete on a Topic, it can do both. With regulations
like GDPR, we can expect the DeleteRecords API to gain popularity and
it's a bit scary that it also allows to delete the topic.
- We also can't distinguish between DescribeLogDirs, DescribeAcls and
ListGroups as they both require Describe on the Cluster resource.
While ListGroups is pretty common for "normal" users, the other 2 are
a bit more on the admin side.
- OffsetCommit only requires Read on Group even though it's
technically a write operation. I think this was already discussed at
some point on the mailing list.

Changing permissions is an expensive process and so far we've not
attempted to come up with alternatives (apart from KIP-277). There is
also a balance between granularity and ease of use, requiring
administrators to set and maintain many permissions is not really an
improvement!

Thanks







Re: [DISCUSS] KIP-275 - Indicate "isClosing" in the SinkTaskContext

2018-04-03 Thread Matt Farmer
Hey Randall,

Devil's advocate sparring is always a fun game so I'm down. =)

Rebalance caused by connectivity failure is the case that caused us to
notice the issue. But the issue
is really more about giving connectors the tools to facilitate rebalances
or a Kafka connect shutdown
cleanly. Perhaps that wasn't clear in the KIP.

In our case timeouts were *not* uniformly affecting tasks. But every time a
timeout occurred in one task,
all tasks lost whatever forward progress they had made. So, yes, in the
specific case of timeouts a
backoff jitter in the connector is a solution for that particular issue.
However, this KIP *also* gives connectors
enough information to behave intelligently during any kind of rebalance or
shutdown because they're able
to discover that preCommit is being invoked for that specific reason. (As
opposed to being invoked
during normal operation.)

On Tue, Apr 3, 2018 at 12:36 PM, Randall Hauch  wrote:

> Matt,
>
> Let me play devil's advocate. Do we need this additional complexity? The
> motivation section talked about needing to deal with task failures due to
> connectivity problems. Generally speaking, isn't it possible that if one
> task has connectivity problems with either the broker or the external
> system that other tasks would as well? And in the particular case of S3, is
> it possible to try and prevent the task shutdown in the first place,
> perhaps by improving how the S3 connector retries? (We did this in the
> Elasticsearch connector using backoff with jitter; see
> https://github.com/confluentinc/kafka-connect-elasticsearch/pull/116 for
> details.)
>
> Best regards,
>
> Randall
>
> On Tue, Apr 3, 2018 at 8:39 AM, Matt Farmer  wrote:
>
> > I have made the requested updates to the KIP! :)
> >
> > On Mon, Apr 2, 2018 at 11:02 AM, Matt Farmer  wrote:
> >
> > > Ugh
> > >
> > > * I can update
> > >
> > > I need more caffeine...
> > >
> > > On Mon, Apr 2, 2018 at 11:01 AM, Matt Farmer  wrote:
> > >
> > >> I'm can update the rejected alternatives section as you describe.
> > >>
> > >> Also, adding a paragraph to the preCommit javadoc also seems like a
> > >> Very Very Good Idea™ so I'll make that update to the KIP as well.
> > >>
> > >> On Mon, Apr 2, 2018 at 10:48 AM, Randall Hauch 
> > wrote:
> > >>
> > >>> Thanks for the KIP proposal, Matt.
> > >>>
> > >>> You mention in the "Rejected Alternatives" section that you
> considered
> > >>> changing the signature of the `preCommit` method but rejected it
> > because
> > >>> it
> > >>> would break backward compatibility. Strictly speaking, it is possible
> > to
> > >>> do
> > >>> this without breaking compatibility by introducing a new `preCommit`
> > >>> method, deprecating the old one, and having the new implementation
> call
> > >>> the
> > >>> old one. Such an approach would be complicated, and I'm not sure it
> > adds
> > >>> any value. In fact, one of the benefits of having a context object is
> > >>> that
> > >>> we can add methods like the one you're proposing without causing any
> > >>> compatibility issues. Anyway, it probably is worth updating this
> > rejected
> > >>> alternative to be a bit more precise.
> > >>>
> > >>> Otherwise, I think this is a good approach, though I'd request that
> we
> > >>> update the `preCommit` JavaDoc to add a paragraph that explains this
> > >>> scenario. Thoughts?
> > >>>
> > >>> Randall
> > >>>
> > >>> On Wed, Mar 28, 2018 at 9:29 PM, Ted Yu  wrote:
> > >>>
> > >>> > I looked at WorkerSinkTask and it seems using a boolean for KIP-275
> > >>> should
> > >>> > suffice for now.
> > >>> >
> > >>> > Thanks
> > >>> >
> > >>> > On Wed, Mar 28, 2018 at 7:20 PM, Matt Farmer  wrote:
> > >>> >
> > >>> > > Hey Ted,
> > >>> > >
> > >>> > > I have not, actually!
> > >>> > >
> > >>> > > Do you think that we're likely to add multiple states here soon?
> > >>> > >
> > >>> > > My instinct is to keep it simple until there are multiple states
> > >>> that we
> > >>> > > would want
> > >>> > > to consider. I really like the simplicity of just getting a
> boolean
> > >>> and
> > >>> > the
> > >>> > > implementation of WorkerSinkTask already passes around a boolean
> to
> > >>> > > indicate this is happening internally. We're really just
> shuttling
> > >>> that
> > >>> > > value into
> > >>> > > the context at the correct moments.
> > >>> > >
> > >>> > > Once we have multiple states, we could choose to provide a more
> > >>> > > appropriately
> > >>> > > named method (e.g. getState?) and reimplement isClosing by
> checking
> > >>> that
> > >>> > > enum
> > >>> > > without breaking compatibility.
> > >>> > >
> > >>> > > However, if we think multiple states here are imminent for some
> > >>> reason, I
> > >>> > > would
> > >>> > > be pretty easy to convince adding that would be worth the extra
> > >>> > complexity!
> > >>> > > :)
> > >>> > >
> > >>> > > Matt
> > >>> > >
> > >>> > > —
> > >>> > > Matt Farmer | Blog  | Twitter
> > >>> > > 
> > >>> >

[jira] [Resolved] (KAFKA-6739) Broker receives error when handling request with java.lang.IllegalArgumentException: Magic v0 does not support record headers

2018-04-03 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-6739.

   Resolution: Fixed
Fix Version/s: 1.0.2

> Broker receives error when handling request with 
> java.lang.IllegalArgumentException: Magic v0 does not support record headers
> -
>
> Key: KAFKA-6739
> URL: https://issues.apache.org/jira/browse/KAFKA-6739
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.0.0
>Reporter: Koelli Mungee
>Assignee: Dhruvil Shah
>Priority: Critical
> Fix For: 1.0.2, 1.2.0, 1.1.1
>
>
> A broker running at 1.0.0 with the following properties 
>  
> {code:java}
> log.message.format.version=1.0
> inter.broker.protocol.version=1.0
> {code}
> receives this ERROR while handling fetch request for a message with a header
> {code:java}
> [2018-03-23 01:48:03,093] ERROR [KafkaApi-1] Error when handling request 
> {replica_id=-1,max_wait_time=100,min_bytes=1,topics=[{topic=test=[{partition=11,fetch_offset=20645,max_bytes=1048576}]}]}
>  (kafka.server.KafkaApis) java.lang.IllegalArgumentException: Magic v0 does 
> not support record headers 
> at 
> org.apache.kafka.common.record.MemoryRecordsBuilder.appendWithOffset(MemoryRecordsBuilder.java:403)
>  
> at 
> org.apache.kafka.common.record.MemoryRecordsBuilder.append(MemoryRecordsBuilder.java:586)
>  
> at 
> org.apache.kafka.common.record.AbstractRecords.convertRecordBatch(AbstractRecords.java:134)
>  
> at 
> org.apache.kafka.common.record.AbstractRecords.downConvert(AbstractRecords.java:109)
>  
> at 
> org.apache.kafka.common.record.FileRecords.downConvert(FileRecords.java:253) 
> at 
> kafka.server.KafkaApis$$anonfun$kafka$server$KafkaApis$$convertedPartitionData$1$1$$anonfun$apply$4.apply(KafkaApis.scala:520)
>  
> at 
> kafka.server.KafkaApis$$anonfun$kafka$server$KafkaApis$$convertedPartitionData$1$1$$anonfun$apply$4.apply(KafkaApis.scala:518)
>  
> at scala.Option.map(Option.scala:146) 
> at 
> kafka.server.KafkaApis$$anonfun$kafka$server$KafkaApis$$convertedPartitionData$1$1.apply(KafkaApis.scala:518)
>  
> at 
> kafka.server.KafkaApis$$anonfun$kafka$server$KafkaApis$$convertedPartitionData$1$1.apply(KafkaApis.scala:508)
>  
> at scala.Option.flatMap(Option.scala:171) 
> at 
> kafka.server.KafkaApis.kafka$server$KafkaApis$$convertedPartitionData$1(KafkaApis.scala:508)
>  
> at 
> kafka.server.KafkaApis$$anonfun$kafka$server$KafkaApis$$createResponse$2$1.apply(KafkaApis.scala:556)
>  
> at 
> kafka.server.KafkaApis$$anonfun$kafka$server$KafkaApis$$createResponse$2$1.apply(KafkaApis.scala:555)
>  
> at scala.collection.Iterator$class.foreach(Iterator.scala:891) 
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) 
> at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) 
> at scala.collection.AbstractIterable.foreach(Iterable.scala:54) 
> at 
> kafka.server.KafkaApis.kafka$server$KafkaApis$$createResponse$2(KafkaApis.scala:555)
>  
> at 
> kafka.server.KafkaApis$$anonfun$kafka$server$KafkaApis$$fetchResponseCallback$1$1.apply(KafkaApis.scala:569)
>  
> at 
> kafka.server.KafkaApis$$anonfun$kafka$server$KafkaApis$$fetchResponseCallback$1$1.apply(KafkaApis.scala:569)
>  
> at 
> kafka.server.KafkaApis$$anonfun$sendResponseMaybeThrottle$1.apply$mcVI$sp(KafkaApis.scala:2034)
>  
> at 
> kafka.server.ClientRequestQuotaManager.maybeRecordAndThrottle(ClientRequestQuotaManager.scala:52)
>  
> at kafka.server.KafkaApis.sendResponseMaybeThrottle(KafkaApis.scala:2033) 
> at 
> kafka.server.KafkaApis.kafka$server$KafkaApis$$fetchResponseCallback$1(KafkaApis.scala:569)
>  
> at 
> kafka.server.KafkaApis$$anonfun$kafka$server$KafkaApis$$processResponseCallback$1$1.apply$mcVI$sp(KafkaApis.scala:588)
>  
> at 
> kafka.server.ClientQuotaManager.maybeRecordAndThrottle(ClientQuotaManager.scala:175)
>  
> at 
> kafka.server.KafkaApis.kafka$server$KafkaApis$$processResponseCallback$1(KafkaApis.scala:587)
>  
> at 
> kafka.server.KafkaApis$$anonfun$handleFetchRequest$3.apply(KafkaApis.scala:604)
>  
> at 
> kafka.server.KafkaApis$$anonfun$handleFetchRequest$3.apply(KafkaApis.scala:604)
>  
> at kafka.server.ReplicaManager.fetchMessages(ReplicaManager.scala:820) 
> at kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:596) 
> at kafka.server.KafkaApis.handle(KafkaApis.scala:100) 
> at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:65) 
> at java.lang.Thread.run(Thread.java:745)
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-6744) MockProducer with transaction enabled doesn't fail on commit if a record was failed

2018-04-03 Thread JIRA
Pascal Gélinas created KAFKA-6744:
-

 Summary: MockProducer with transaction enabled doesn't fail on 
commit if a record was failed
 Key: KAFKA-6744
 URL: https://issues.apache.org/jira/browse/KAFKA-6744
 Project: Kafka
  Issue Type: Bug
  Components: producer 
Affects Versions: 1.0.0
Reporter: Pascal Gélinas


The KafkaProducer#send documentation states the following:

When used as part of a transaction, it is not necessary to define a callback or 
check the result of the future in order to detect errors from send. If any of 
the send calls failed with an irrecoverable error, the final 
commitTransaction() call will fail and throw the exception from the last failed 
send.

So I was expecting the following to throw an exception:

{{*MockProducer* producer = new MockProducer<>(false,}}
{{ new StringSerializer(), new ByteArraySerializer());}}
{{producer.initTransactions();}}
{{producer.beginTransaction();}}
{{producer.send(new ProducerRecord<>("foo", new byte[]{}));}}
{{producer.errorNext(new RuntimeException());}}
{{producer.commitTransaction(); // Expecting this to throw}}

Unfortunately, the commitTransaction() call returns successfully.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] KIP-274: Kafka Streams Skipped Records Metrics

2018-04-03 Thread John Roesler
I agree we should add as much information as is reasonable to the log. For
example, see this WIP PR I started for this KIP:

https://github.com/apache/kafka/pull/4812/files#diff-88d129f048bc842c7db5b2566a45fce8R80

and

https://github.com/apache/kafka/pull/4812/files#diff-69e6789eb675ec978a1abd24fed96eb1R111

I'm not sure if we should nail down the log messages in the KIP or in the
PR discussion. What say you?

Thanks,
-John

On Tue, Apr 3, 2018 at 12:20 AM, Matthias J. Sax 
wrote:

> Thanks for sharing your thoughts. As I mentioned originally, I am not
> sure about the right log level either. Your arguments are convincing --
> thus, I am fine with keeping WARN level.
>
> The task vs thread level argument is an interesting one. However, I am
> wondering if we should add this information into the corresponding WARN
> logs that we write anyway? For this case, we can also log the
> corresponding operator (and other information like topic name etc if
> needed). WDYT about this?
>
>
> -Matthias
>
> On 4/2/18 8:31 PM, Guozhang Wang wrote:
> > Regarding logging: I'm inclined to keep logging at WARN level since
> skipped
> > records are not expected in normal execution (for all reasons that we are
> > aware of), and hence when error happens users should be alerted from
> > metrics and looked into the log files, so to me if it is really spamming
> > the log files it is also a good alert for users. Besides for deserialize
> > errors we already log at WARN level for this reason.
> >
> > Regarding the metrics-levels: I was pondering on that as well. What made
> me
> > to think and agree on task-level than thread-level is that for some
> reasons
> > like window retention, they may possibly be happening on a subset of
> input
> > partitions, and tasks are correlated with partitions the task-level
> metrics
> > can help users to narrow down on the specific input data partitions.
> >
> >
> > Guozhang
> >
> >
> > On Mon, Apr 2, 2018 at 6:43 PM, John Roesler  wrote:
> >
> >> Hi Matthias,
> >>
> >> No worries! Thanks for the reply.
> >>
> >> 1) There isn't a connection. I tried using the TopologyTestDriver to
> write
> >> a quick test exercising the current behavior and discovered that the
> >> metrics weren't available. It seemed like they should be, so I tacked
> it on
> >> to this KIP. If you feel it's inappropriate, I can pull it back out.
> >>
> >> 2) I was also concerned about that, but I figured it would come up in
> >> discussion if I just went ahead and proposed it. And here we are!
> >>
> >> Here's my thought: maybe there are two classes of skips: "controlled"
> and
> >> "uncontrolled", where "controlled" means, as an app author, I
> deliberately
> >> filter out some events, and "uncontrolled" means that I simply don't
> >> account for some feature of the data, and the framework skips them (as
> >> opposed to crashing).
> >>
> >> In this breakdowns, the skips I'm adding metrics for are all
> uncontrolled
> >> skips (and we hope to measure all the uncontrolled skips). Our skips are
> >> well documented, so it wouldn't be terrible to have an application in
> which
> >> you know you expect to have tons of uncontrolled skips, but it's not
> great
> >> either, since you may also have some *unexpected* uncontrolled skips.
> It'll
> >> be difficult to notice, since you're probably not alerting on the metric
> >> and filtering out the logs (whatever their level).
> >>
> >> I'd recommend any app author, as an alternative, to convert all expected
> >> skips to controlled ones, by updating the topology to filter those
> records
> >> out.
> >>
> >> Following from my recommendation, as a library author, I'm inclined to
> mark
> >> those logs WARN, since in my opinion, they should be concerning to the
> app
> >> authors. I'd definitely want to show, rather than hide, them by
> default, so
> >> I would pick INFO at least.
> >>
> >> That said, logging is always a tricky issue for lower-level libraries
> that
> >> run inside user code, since we don't have all the information we need to
> >> make the right call.
> >>
> >>
> >>
> >> On your last note, yeah, I got that impression from Guozhang as well.
> >> Thanks for the clarification.
> >>
> >> -John
> >>
> >>
> >>
> >> On Mon, Apr 2, 2018 at 4:03 PM, Matthias J. Sax 
> >> wrote:
> >>
> >>> John,
> >>>
> >>> sorry for my late reply and thanks for updating the KIP.
> >>>
> >>> I like your approach about "metrics are for monitoring, logs are for
> >>> debugging" -- however:
> >>>
> >>> 1) I don't see a connection between this and the task-level metrics
> that
> >>> you propose to get the metrics in `TopologyTestDriver`. I don't think
> >>> people would monitor the `TopologyTestDriver` an thus wondering why it
> >>> is important to include the metrics there? Thread-level metric might be
> >>> easier to monitor though (ie, less different metric to monitor).
> >>>
> >>> 2) I am a little worried about WARN level logging and that it might be
> >>> too chatty -- as you pointed out, it's abo

Re: [DISCUSS] KIP-274: Kafka Streams Skipped Records Metrics

2018-04-03 Thread Guozhang Wang
I think Matthias' comment is that, we can still record the metrics on the
thread-level, while having the WARN log entry to include sufficient context
information so that users can still easily narrow down the investigation
scope.


Guozhang

On Tue, Apr 3, 2018 at 11:22 AM, John Roesler  wrote:

> I agree we should add as much information as is reasonable to the log. For
> example, see this WIP PR I started for this KIP:
>
> https://github.com/apache/kafka/pull/4812/files#diff-
> 88d129f048bc842c7db5b2566a45fce8R80
>
> and
>
> https://github.com/apache/kafka/pull/4812/files#diff-
> 69e6789eb675ec978a1abd24fed96eb1R111
>
> I'm not sure if we should nail down the log messages in the KIP or in the
> PR discussion. What say you?
>
> Thanks,
> -John
>
> On Tue, Apr 3, 2018 at 12:20 AM, Matthias J. Sax 
> wrote:
>
> > Thanks for sharing your thoughts. As I mentioned originally, I am not
> > sure about the right log level either. Your arguments are convincing --
> > thus, I am fine with keeping WARN level.
> >
> > The task vs thread level argument is an interesting one. However, I am
> > wondering if we should add this information into the corresponding WARN
> > logs that we write anyway? For this case, we can also log the
> > corresponding operator (and other information like topic name etc if
> > needed). WDYT about this?
> >
> >
> > -Matthias
> >
> > On 4/2/18 8:31 PM, Guozhang Wang wrote:
> > > Regarding logging: I'm inclined to keep logging at WARN level since
> > skipped
> > > records are not expected in normal execution (for all reasons that we
> are
> > > aware of), and hence when error happens users should be alerted from
> > > metrics and looked into the log files, so to me if it is really
> spamming
> > > the log files it is also a good alert for users. Besides for
> deserialize
> > > errors we already log at WARN level for this reason.
> > >
> > > Regarding the metrics-levels: I was pondering on that as well. What
> made
> > me
> > > to think and agree on task-level than thread-level is that for some
> > reasons
> > > like window retention, they may possibly be happening on a subset of
> > input
> > > partitions, and tasks are correlated with partitions the task-level
> > metrics
> > > can help users to narrow down on the specific input data partitions.
> > >
> > >
> > > Guozhang
> > >
> > >
> > > On Mon, Apr 2, 2018 at 6:43 PM, John Roesler 
> wrote:
> > >
> > >> Hi Matthias,
> > >>
> > >> No worries! Thanks for the reply.
> > >>
> > >> 1) There isn't a connection. I tried using the TopologyTestDriver to
> > write
> > >> a quick test exercising the current behavior and discovered that the
> > >> metrics weren't available. It seemed like they should be, so I tacked
> > it on
> > >> to this KIP. If you feel it's inappropriate, I can pull it back out.
> > >>
> > >> 2) I was also concerned about that, but I figured it would come up in
> > >> discussion if I just went ahead and proposed it. And here we are!
> > >>
> > >> Here's my thought: maybe there are two classes of skips: "controlled"
> > and
> > >> "uncontrolled", where "controlled" means, as an app author, I
> > deliberately
> > >> filter out some events, and "uncontrolled" means that I simply don't
> > >> account for some feature of the data, and the framework skips them (as
> > >> opposed to crashing).
> > >>
> > >> In this breakdowns, the skips I'm adding metrics for are all
> > uncontrolled
> > >> skips (and we hope to measure all the uncontrolled skips). Our skips
> are
> > >> well documented, so it wouldn't be terrible to have an application in
> > which
> > >> you know you expect to have tons of uncontrolled skips, but it's not
> > great
> > >> either, since you may also have some *unexpected* uncontrolled skips.
> > It'll
> > >> be difficult to notice, since you're probably not alerting on the
> metric
> > >> and filtering out the logs (whatever their level).
> > >>
> > >> I'd recommend any app author, as an alternative, to convert all
> expected
> > >> skips to controlled ones, by updating the topology to filter those
> > records
> > >> out.
> > >>
> > >> Following from my recommendation, as a library author, I'm inclined to
> > mark
> > >> those logs WARN, since in my opinion, they should be concerning to the
> > app
> > >> authors. I'd definitely want to show, rather than hide, them by
> > default, so
> > >> I would pick INFO at least.
> > >>
> > >> That said, logging is always a tricky issue for lower-level libraries
> > that
> > >> run inside user code, since we don't have all the information we need
> to
> > >> make the right call.
> > >>
> > >>
> > >>
> > >> On your last note, yeah, I got that impression from Guozhang as well.
> > >> Thanks for the clarification.
> > >>
> > >> -John
> > >>
> > >>
> > >>
> > >> On Mon, Apr 2, 2018 at 4:03 PM, Matthias J. Sax <
> matth...@confluent.io>
> > >> wrote:
> > >>
> > >>> John,
> > >>>
> > >>> sorry for my late reply and thanks for updating the KIP.
> > >>>
> > >>> I like your approach about "

Re: [DISCUSS] KIP-274: Kafka Streams Skipped Records Metrics

2018-04-03 Thread John Roesler
Oh, sorry, I missed the point.

Yeah, we can totally do that. The reason to move it to the task level was
mainly to make it available for the metrics in TopologyTestDriver as well.
But if we decide that's a non-goal, then there's no motivation to change it.

And actually that reminds me that we do have an open question about whether
I should add a metrics getter to the TopologyTestDriver's interface. WDYT?

Thanks,
-John

On Tue, Apr 3, 2018 at 1:26 PM, Guozhang Wang  wrote:

> I think Matthias' comment is that, we can still record the metrics on the
> thread-level, while having the WARN log entry to include sufficient context
> information so that users can still easily narrow down the investigation
> scope.
>
>
> Guozhang
>
> On Tue, Apr 3, 2018 at 11:22 AM, John Roesler  wrote:
>
> > I agree we should add as much information as is reasonable to the log.
> For
> > example, see this WIP PR I started for this KIP:
> >
> > https://github.com/apache/kafka/pull/4812/files#diff-
> > 88d129f048bc842c7db5b2566a45fce8R80
> >
> > and
> >
> > https://github.com/apache/kafka/pull/4812/files#diff-
> > 69e6789eb675ec978a1abd24fed96eb1R111
> >
> > I'm not sure if we should nail down the log messages in the KIP or in the
> > PR discussion. What say you?
> >
> > Thanks,
> > -John
> >
> > On Tue, Apr 3, 2018 at 12:20 AM, Matthias J. Sax 
> > wrote:
> >
> > > Thanks for sharing your thoughts. As I mentioned originally, I am not
> > > sure about the right log level either. Your arguments are convincing --
> > > thus, I am fine with keeping WARN level.
> > >
> > > The task vs thread level argument is an interesting one. However, I am
> > > wondering if we should add this information into the corresponding WARN
> > > logs that we write anyway? For this case, we can also log the
> > > corresponding operator (and other information like topic name etc if
> > > needed). WDYT about this?
> > >
> > >
> > > -Matthias
> > >
> > > On 4/2/18 8:31 PM, Guozhang Wang wrote:
> > > > Regarding logging: I'm inclined to keep logging at WARN level since
> > > skipped
> > > > records are not expected in normal execution (for all reasons that we
> > are
> > > > aware of), and hence when error happens users should be alerted from
> > > > metrics and looked into the log files, so to me if it is really
> > spamming
> > > > the log files it is also a good alert for users. Besides for
> > deserialize
> > > > errors we already log at WARN level for this reason.
> > > >
> > > > Regarding the metrics-levels: I was pondering on that as well. What
> > made
> > > me
> > > > to think and agree on task-level than thread-level is that for some
> > > reasons
> > > > like window retention, they may possibly be happening on a subset of
> > > input
> > > > partitions, and tasks are correlated with partitions the task-level
> > > metrics
> > > > can help users to narrow down on the specific input data partitions.
> > > >
> > > >
> > > > Guozhang
> > > >
> > > >
> > > > On Mon, Apr 2, 2018 at 6:43 PM, John Roesler 
> > wrote:
> > > >
> > > >> Hi Matthias,
> > > >>
> > > >> No worries! Thanks for the reply.
> > > >>
> > > >> 1) There isn't a connection. I tried using the TopologyTestDriver to
> > > write
> > > >> a quick test exercising the current behavior and discovered that the
> > > >> metrics weren't available. It seemed like they should be, so I
> tacked
> > > it on
> > > >> to this KIP. If you feel it's inappropriate, I can pull it back out.
> > > >>
> > > >> 2) I was also concerned about that, but I figured it would come up
> in
> > > >> discussion if I just went ahead and proposed it. And here we are!
> > > >>
> > > >> Here's my thought: maybe there are two classes of skips:
> "controlled"
> > > and
> > > >> "uncontrolled", where "controlled" means, as an app author, I
> > > deliberately
> > > >> filter out some events, and "uncontrolled" means that I simply don't
> > > >> account for some feature of the data, and the framework skips them
> (as
> > > >> opposed to crashing).
> > > >>
> > > >> In this breakdowns, the skips I'm adding metrics for are all
> > > uncontrolled
> > > >> skips (and we hope to measure all the uncontrolled skips). Our skips
> > are
> > > >> well documented, so it wouldn't be terrible to have an application
> in
> > > which
> > > >> you know you expect to have tons of uncontrolled skips, but it's not
> > > great
> > > >> either, since you may also have some *unexpected* uncontrolled
> skips.
> > > It'll
> > > >> be difficult to notice, since you're probably not alerting on the
> > metric
> > > >> and filtering out the logs (whatever their level).
> > > >>
> > > >> I'd recommend any app author, as an alternative, to convert all
> > expected
> > > >> skips to controlled ones, by updating the topology to filter those
> > > records
> > > >> out.
> > > >>
> > > >> Following from my recommendation, as a library author, I'm inclined
> to
> > > mark
> > > >> those logs WARN, since in my opinion, they should be concerning to
> the
> > > app
> 

Re: [DISCUSS] KIP-274: Kafka Streams Skipped Records Metrics

2018-04-03 Thread Matthias J. Sax
Thanks Guozhang, that was my intent.

@John: yes, we should not nail down the exact log message. It's just to
point out the trade-off. If we can get the required information in the
logs, we might not need task level metrics.


-Matthias

On 4/3/18 11:26 AM, Guozhang Wang wrote:
> I think Matthias' comment is that, we can still record the metrics on the
> thread-level, while having the WARN log entry to include sufficient context
> information so that users can still easily narrow down the investigation
> scope.
> 
> 
> Guozhang
> 
> On Tue, Apr 3, 2018 at 11:22 AM, John Roesler  wrote:
> 
>> I agree we should add as much information as is reasonable to the log. For
>> example, see this WIP PR I started for this KIP:
>>
>> https://github.com/apache/kafka/pull/4812/files#diff-
>> 88d129f048bc842c7db5b2566a45fce8R80
>>
>> and
>>
>> https://github.com/apache/kafka/pull/4812/files#diff-
>> 69e6789eb675ec978a1abd24fed96eb1R111
>>
>> I'm not sure if we should nail down the log messages in the KIP or in the
>> PR discussion. What say you?
>>
>> Thanks,
>> -John
>>
>> On Tue, Apr 3, 2018 at 12:20 AM, Matthias J. Sax 
>> wrote:
>>
>>> Thanks for sharing your thoughts. As I mentioned originally, I am not
>>> sure about the right log level either. Your arguments are convincing --
>>> thus, I am fine with keeping WARN level.
>>>
>>> The task vs thread level argument is an interesting one. However, I am
>>> wondering if we should add this information into the corresponding WARN
>>> logs that we write anyway? For this case, we can also log the
>>> corresponding operator (and other information like topic name etc if
>>> needed). WDYT about this?
>>>
>>>
>>> -Matthias
>>>
>>> On 4/2/18 8:31 PM, Guozhang Wang wrote:
 Regarding logging: I'm inclined to keep logging at WARN level since
>>> skipped
 records are not expected in normal execution (for all reasons that we
>> are
 aware of), and hence when error happens users should be alerted from
 metrics and looked into the log files, so to me if it is really
>> spamming
 the log files it is also a good alert for users. Besides for
>> deserialize
 errors we already log at WARN level for this reason.

 Regarding the metrics-levels: I was pondering on that as well. What
>> made
>>> me
 to think and agree on task-level than thread-level is that for some
>>> reasons
 like window retention, they may possibly be happening on a subset of
>>> input
 partitions, and tasks are correlated with partitions the task-level
>>> metrics
 can help users to narrow down on the specific input data partitions.


 Guozhang


 On Mon, Apr 2, 2018 at 6:43 PM, John Roesler 
>> wrote:

> Hi Matthias,
>
> No worries! Thanks for the reply.
>
> 1) There isn't a connection. I tried using the TopologyTestDriver to
>>> write
> a quick test exercising the current behavior and discovered that the
> metrics weren't available. It seemed like they should be, so I tacked
>>> it on
> to this KIP. If you feel it's inappropriate, I can pull it back out.
>
> 2) I was also concerned about that, but I figured it would come up in
> discussion if I just went ahead and proposed it. And here we are!
>
> Here's my thought: maybe there are two classes of skips: "controlled"
>>> and
> "uncontrolled", where "controlled" means, as an app author, I
>>> deliberately
> filter out some events, and "uncontrolled" means that I simply don't
> account for some feature of the data, and the framework skips them (as
> opposed to crashing).
>
> In this breakdowns, the skips I'm adding metrics for are all
>>> uncontrolled
> skips (and we hope to measure all the uncontrolled skips). Our skips
>> are
> well documented, so it wouldn't be terrible to have an application in
>>> which
> you know you expect to have tons of uncontrolled skips, but it's not
>>> great
> either, since you may also have some *unexpected* uncontrolled skips.
>>> It'll
> be difficult to notice, since you're probably not alerting on the
>> metric
> and filtering out the logs (whatever their level).
>
> I'd recommend any app author, as an alternative, to convert all
>> expected
> skips to controlled ones, by updating the topology to filter those
>>> records
> out.
>
> Following from my recommendation, as a library author, I'm inclined to
>>> mark
> those logs WARN, since in my opinion, they should be concerning to the
>>> app
> authors. I'd definitely want to show, rather than hide, them by
>>> default, so
> I would pick INFO at least.
>
> That said, logging is always a tricky issue for lower-level libraries
>>> that
> run inside user code, since we don't have all the information we need
>> to
> make the right call.
>
>
>
> On your last note, yeah, I got that impression from Guozhang as well.
> Thanks for the clarification.
>
>>>

Build failed in Jenkins: kafka-trunk-jdk9 #527

2018-04-03 Thread Apache Jenkins Server
See 


Changes:

[me] MINOR: Make kafka-streams-test-utils dependencies work with releases

[me] KAFKA-6728: Corrected the worker’s instantiation of the HeaderConverter

[wangguoz] MINOR: Refactor return value (#4810)

--
[...truncated 1.48 MB...]
kafka.controller.ControllerEventManagerTest > testEventThatThrowsException 
STARTED

kafka.controller.ControllerEventManagerTest > testEventThatThrowsException 
PASSED

kafka.controller.ControllerEventManagerTest > testSuccessfulEvent STARTED

kafka.controller.ControllerEventManagerTest > testSuccessfulEvent PASSED

kafka.controller.ControllerFailoverTest > testHandleIllegalStateException 
STARTED

kafka.controller.ControllerFailoverTest > testHandleIllegalStateException PASSED

kafka.controller.PartitionLeaderElectionAlgorithmsTest > 
testPreferredReplicaPartitionLeaderElectionPreferredReplicaNotInIsrLive STARTED

kafka.controller.PartitionLeaderElectionAlgorithmsTest > 
testPreferredReplicaPartitionLeaderElectionPreferredReplicaNotInIsrLive PASSED

kafka.controller.PartitionLeaderElectionAlgorithmsTest > 
testControlledShutdownPartitionLeaderElectionLastIsrShuttingDown STARTED

kafka.controller.PartitionLeaderElectionAlgorithmsTest > 
testControlledShutdownPartitionLeaderElectionLastIsrShuttingDown PASSED

kafka.controller.PartitionLeaderElectionAlgorithmsTest > 
testControlledShutdownPartitionLeaderElection STARTED

kafka.controller.PartitionLeaderElectionAlgorithmsTest > 
testControlledShutdownPartitionLeaderElection PASSED

kafka.controller.PartitionLeaderElectionAlgorithmsTest > 
testPreferredReplicaPartitionLeaderElectionPreferredReplicaInIsrNotLive STARTED

kafka.controller.PartitionLeaderElectionAlgorithmsTest > 
testPreferredReplicaPartitionLeaderElectionPreferredReplicaInIsrNotLive PASSED

kafka.controller.PartitionLeaderElectionAlgorithmsTest > 
testReassignPartitionLeaderElectionWithNoLiveIsr STARTED

kafka.controller.PartitionLeaderElectionAlgorithmsTest > 
testReassignPartitionLeaderElectionWithNoLiveIsr PASSED

kafka.controller.PartitionLeaderElectionAlgorithmsTest > 
testReassignPartitionLeaderElection STARTED

kafka.controller.PartitionLeaderElectionAlgorithmsTest > 
testReassignPartitionLeaderElection PASSED

kafka.controller.PartitionLeaderElectionAlgorithmsTest > 
testOfflinePartitionLeaderElection STARTED

kafka.controller.PartitionLeaderElectionAlgorithmsTest > 
testOfflinePartitionLeaderElection PASSED

kafka.controller.PartitionLeaderElectionAlgorithmsTest > 
testPreferredReplicaPartitionLeaderElection STARTED

kafka.controller.PartitionLeaderElectionAlgorithmsTest > 
testPreferredReplicaPartitionLeaderElection PASSED

kafka.controller.PartitionLeaderElectionAlgorithmsTest > 
testReassignPartitionLeaderElectionWithEmptyIsr STARTED

kafka.controller.PartitionLeaderElectionAlgorithmsTest > 
testReassignPartitionLeaderElectionWithEmptyIsr PASSED

kafka.controller.PartitionLeaderElectionAlgorithmsTest > 
testControlledShutdownPartitionLeaderElectionAllIsrSimultaneouslyShutdown 
STARTED

kafka.controller.PartitionLeaderElectionAlgorithmsTest > 
testControlledShutdownPartitionLeaderElectionAllIsrSimultaneouslyShutdown PASSED

kafka.controller.PartitionLeaderElectionAlgorithmsTest > 
testOfflinePartitionLeaderElectionLastIsrOfflineUncleanLeaderElectionEnabled 
STARTED

kafka.controller.PartitionLeaderElectionAlgorithmsTest > 
testOfflinePartitionLeaderElectionLastIsrOfflineUncleanLeaderElectionEnabled 
PASSED

kafka.controller.PartitionLeaderElectionAlgorithmsTest > 
testPreferredReplicaPartitionLeaderElectionPreferredReplicaNotInIsrNotLive 
STARTED

kafka.controller.PartitionLeaderElectionAlgorithmsTest > 
testPreferredReplicaPartitionLeaderElectionPreferredReplicaNotInIsrNotLive 
PASSED

kafka.controller.PartitionLeaderElectionAlgorithmsTest > 
testOfflinePartitionLeaderElectionLastIsrOfflineUncleanLeaderElectionDisabled 
STARTED

kafka.controller.PartitionLeaderElectionAlgorithmsTest > 
testOfflinePartitionLeaderElectionLastIsrOfflineUncleanLeaderElectionDisabled 
PASSED

kafka.network.SocketServerTest > testGracefulClose STARTED

kafka.network.SocketServerTest > testGracefulClose PASSED

kafka.network.SocketServerTest > controlThrowable STARTED

kafka.network.SocketServerTest > controlThrowable PASSED

kafka.network.SocketServerTest > testRequestMetricsAfterStop STARTED

kafka.network.SocketServerTest > testRequestMetricsAfterStop PASSED

kafka.network.SocketServerTest > testConnectionIdReuse STARTED

kafka.network.SocketServerTest > testConnectionIdReuse PASSED

kafka.network.SocketServerTest > testClientDisconnectionUpdatesRequestMetrics 
STARTED

kafka.network.SocketServerTest > testClientDisconnectionUpdatesRequestMetrics 
PASSED

kafka.network.SocketServerTest > testProcessorMetricsTags STARTED

kafka.network.SocketServerTest > testProcessorMetricsTags PASSED

kafka.network.SocketServerTest > testMaxConnectionsPerIp 

Re: [DISCUSS] KIP-274: Kafka Streams Skipped Records Metrics

2018-04-03 Thread Matthias J. Sax
I am fine adding `TopologyTestDriver#metrics()`. It should be helpful to
test custom metrics users can implement.

-Matthias


On 4/3/18 11:35 AM, John Roesler wrote:
> Oh, sorry, I missed the point.
> 
> Yeah, we can totally do that. The reason to move it to the task level was
> mainly to make it available for the metrics in TopologyTestDriver as well.
> But if we decide that's a non-goal, then there's no motivation to change it.
> 
> And actually that reminds me that we do have an open question about whether
> I should add a metrics getter to the TopologyTestDriver's interface. WDYT?
> 
> Thanks,
> -John
> 
> On Tue, Apr 3, 2018 at 1:26 PM, Guozhang Wang  wrote:
> 
>> I think Matthias' comment is that, we can still record the metrics on the
>> thread-level, while having the WARN log entry to include sufficient context
>> information so that users can still easily narrow down the investigation
>> scope.
>>
>>
>> Guozhang
>>
>> On Tue, Apr 3, 2018 at 11:22 AM, John Roesler  wrote:
>>
>>> I agree we should add as much information as is reasonable to the log.
>> For
>>> example, see this WIP PR I started for this KIP:
>>>
>>> https://github.com/apache/kafka/pull/4812/files#diff-
>>> 88d129f048bc842c7db5b2566a45fce8R80
>>>
>>> and
>>>
>>> https://github.com/apache/kafka/pull/4812/files#diff-
>>> 69e6789eb675ec978a1abd24fed96eb1R111
>>>
>>> I'm not sure if we should nail down the log messages in the KIP or in the
>>> PR discussion. What say you?
>>>
>>> Thanks,
>>> -John
>>>
>>> On Tue, Apr 3, 2018 at 12:20 AM, Matthias J. Sax 
>>> wrote:
>>>
 Thanks for sharing your thoughts. As I mentioned originally, I am not
 sure about the right log level either. Your arguments are convincing --
 thus, I am fine with keeping WARN level.

 The task vs thread level argument is an interesting one. However, I am
 wondering if we should add this information into the corresponding WARN
 logs that we write anyway? For this case, we can also log the
 corresponding operator (and other information like topic name etc if
 needed). WDYT about this?


 -Matthias

 On 4/2/18 8:31 PM, Guozhang Wang wrote:
> Regarding logging: I'm inclined to keep logging at WARN level since
 skipped
> records are not expected in normal execution (for all reasons that we
>>> are
> aware of), and hence when error happens users should be alerted from
> metrics and looked into the log files, so to me if it is really
>>> spamming
> the log files it is also a good alert for users. Besides for
>>> deserialize
> errors we already log at WARN level for this reason.
>
> Regarding the metrics-levels: I was pondering on that as well. What
>>> made
 me
> to think and agree on task-level than thread-level is that for some
 reasons
> like window retention, they may possibly be happening on a subset of
 input
> partitions, and tasks are correlated with partitions the task-level
 metrics
> can help users to narrow down on the specific input data partitions.
>
>
> Guozhang
>
>
> On Mon, Apr 2, 2018 at 6:43 PM, John Roesler 
>>> wrote:
>
>> Hi Matthias,
>>
>> No worries! Thanks for the reply.
>>
>> 1) There isn't a connection. I tried using the TopologyTestDriver to
 write
>> a quick test exercising the current behavior and discovered that the
>> metrics weren't available. It seemed like they should be, so I
>> tacked
 it on
>> to this KIP. If you feel it's inappropriate, I can pull it back out.
>>
>> 2) I was also concerned about that, but I figured it would come up
>> in
>> discussion if I just went ahead and proposed it. And here we are!
>>
>> Here's my thought: maybe there are two classes of skips:
>> "controlled"
 and
>> "uncontrolled", where "controlled" means, as an app author, I
 deliberately
>> filter out some events, and "uncontrolled" means that I simply don't
>> account for some feature of the data, and the framework skips them
>> (as
>> opposed to crashing).
>>
>> In this breakdowns, the skips I'm adding metrics for are all
 uncontrolled
>> skips (and we hope to measure all the uncontrolled skips). Our skips
>>> are
>> well documented, so it wouldn't be terrible to have an application
>> in
 which
>> you know you expect to have tons of uncontrolled skips, but it's not
 great
>> either, since you may also have some *unexpected* uncontrolled
>> skips.
 It'll
>> be difficult to notice, since you're probably not alerting on the
>>> metric
>> and filtering out the logs (whatever their level).
>>
>> I'd recommend any app author, as an alternative, to convert all
>>> expected
>> skips to controlled ones, by updating the topology to filter those
 records
>> out.
>>
>> Following from my recommendation, as a library author, I'm inclined
>> to
 mark
>> tho

Re: [DISCUSS] KIP-274: Kafka Streams Skipped Records Metrics

2018-04-03 Thread Guozhang Wang
1. If we can indeed gather all the context information from the log4j
entries I'd suggest we change to thread-level (I'm not sure if that is
doable, so if John have already some WIP PR that can help us decide).

2. We can consider adding the API in TopologyTestDriver for general testing
purposes; that being said, I think Matthias has a good point that this
alone should not be a driving motivation for us to keep this metric as
task-level if 1) is true.



Guozhang


On Tue, Apr 3, 2018 at 11:36 AM, Matthias J. Sax 
wrote:

> Thanks Guozhang, that was my intent.
>
> @John: yes, we should not nail down the exact log message. It's just to
> point out the trade-off. If we can get the required information in the
> logs, we might not need task level metrics.
>
>
> -Matthias
>
> On 4/3/18 11:26 AM, Guozhang Wang wrote:
> > I think Matthias' comment is that, we can still record the metrics on the
> > thread-level, while having the WARN log entry to include sufficient
> context
> > information so that users can still easily narrow down the investigation
> > scope.
> >
> >
> > Guozhang
> >
> > On Tue, Apr 3, 2018 at 11:22 AM, John Roesler  wrote:
> >
> >> I agree we should add as much information as is reasonable to the log.
> For
> >> example, see this WIP PR I started for this KIP:
> >>
> >> https://github.com/apache/kafka/pull/4812/files#diff-
> >> 88d129f048bc842c7db5b2566a45fce8R80
> >>
> >> and
> >>
> >> https://github.com/apache/kafka/pull/4812/files#diff-
> >> 69e6789eb675ec978a1abd24fed96eb1R111
> >>
> >> I'm not sure if we should nail down the log messages in the KIP or in
> the
> >> PR discussion. What say you?
> >>
> >> Thanks,
> >> -John
> >>
> >> On Tue, Apr 3, 2018 at 12:20 AM, Matthias J. Sax  >
> >> wrote:
> >>
> >>> Thanks for sharing your thoughts. As I mentioned originally, I am not
> >>> sure about the right log level either. Your arguments are convincing --
> >>> thus, I am fine with keeping WARN level.
> >>>
> >>> The task vs thread level argument is an interesting one. However, I am
> >>> wondering if we should add this information into the corresponding WARN
> >>> logs that we write anyway? For this case, we can also log the
> >>> corresponding operator (and other information like topic name etc if
> >>> needed). WDYT about this?
> >>>
> >>>
> >>> -Matthias
> >>>
> >>> On 4/2/18 8:31 PM, Guozhang Wang wrote:
>  Regarding logging: I'm inclined to keep logging at WARN level since
> >>> skipped
>  records are not expected in normal execution (for all reasons that we
> >> are
>  aware of), and hence when error happens users should be alerted from
>  metrics and looked into the log files, so to me if it is really
> >> spamming
>  the log files it is also a good alert for users. Besides for
> >> deserialize
>  errors we already log at WARN level for this reason.
> 
>  Regarding the metrics-levels: I was pondering on that as well. What
> >> made
> >>> me
>  to think and agree on task-level than thread-level is that for some
> >>> reasons
>  like window retention, they may possibly be happening on a subset of
> >>> input
>  partitions, and tasks are correlated with partitions the task-level
> >>> metrics
>  can help users to narrow down on the specific input data partitions.
> 
> 
>  Guozhang
> 
> 
>  On Mon, Apr 2, 2018 at 6:43 PM, John Roesler 
> >> wrote:
> 
> > Hi Matthias,
> >
> > No worries! Thanks for the reply.
> >
> > 1) There isn't a connection. I tried using the TopologyTestDriver to
> >>> write
> > a quick test exercising the current behavior and discovered that the
> > metrics weren't available. It seemed like they should be, so I tacked
> >>> it on
> > to this KIP. If you feel it's inappropriate, I can pull it back out.
> >
> > 2) I was also concerned about that, but I figured it would come up in
> > discussion if I just went ahead and proposed it. And here we are!
> >
> > Here's my thought: maybe there are two classes of skips: "controlled"
> >>> and
> > "uncontrolled", where "controlled" means, as an app author, I
> >>> deliberately
> > filter out some events, and "uncontrolled" means that I simply don't
> > account for some feature of the data, and the framework skips them
> (as
> > opposed to crashing).
> >
> > In this breakdowns, the skips I'm adding metrics for are all
> >>> uncontrolled
> > skips (and we hope to measure all the uncontrolled skips). Our skips
> >> are
> > well documented, so it wouldn't be terrible to have an application in
> >>> which
> > you know you expect to have tons of uncontrolled skips, but it's not
> >>> great
> > either, since you may also have some *unexpected* uncontrolled skips.
> >>> It'll
> > be difficult to notice, since you're probably not alerting on the
> >> metric
> > and filtering out the logs (whatever their level).
> >
> > I'd recommend any app author, as an al

Build failed in Jenkins: kafka-1.0-jdk7 #183

2018-04-03 Thread Apache Jenkins Server
See 


Changes:

[jason] KAFKA-6739; Ignore headers when down-converting from V2 to V0/V1 (#4813)

--
[...truncated 373.00 KB...]
kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldNotResetEpochHistoryTailIfUndefinedPassed PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldReturnUnsupportedIfNoEpochRecorded STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldReturnUnsupportedIfNoEpochRecorded PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldRetainLatestEpochOnClearAllEarliest STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldRetainLatestEpochOnClearAllEarliest PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldPersistEpochsBetweenInstances STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldPersistEpochsBetweenInstances PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldNotClearAnythingIfOffsetToFirstOffset STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldNotClearAnythingIfOffsetToFirstOffset PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldNotLetOffsetsGoBackwardsEvenIfEpochsProgress STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldNotLetOffsetsGoBackwardsEvenIfEpochsProgress PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldGetFirstOffsetOfSubsequentEpochWhenOffsetRequestedForPreviousEpoch STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldGetFirstOffsetOfSubsequentEpochWhenOffsetRequestedForPreviousEpoch PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldUpdateOffsetBetweenEpochBoundariesOnClearEarliest2 STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldUpdateOffsetBetweenEpochBoundariesOnClearEarliest2 PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > shouldClearEarliestOnEmptyCache 
STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > shouldClearEarliestOnEmptyCache 
PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldPreserveResetOffsetOnClearEarliestIfOneExists STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldPreserveResetOffsetOnClearEarliestIfOneExists PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldUpdateOffsetBetweenEpochBoundariesOnClearEarliest STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldUpdateOffsetBetweenEpochBoundariesOnClearEarliest PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldReturnInvalidOffsetIfEpochIsRequestedWhichIsNotCurrentlyTracked STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldReturnInvalidOffsetIfEpochIsRequestedWhichIsNotCurrentlyTracked PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > shouldFetchEndOffsetOfEmptyCache 
STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > shouldFetchEndOffsetOfEmptyCache 
PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldRetainLatestEpochOnClearAllEarliestAndUpdateItsOffset STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldRetainLatestEpochOnClearAllEarliestAndUpdateItsOffset PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > shouldClearAllEntries STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > shouldClearAllEntries PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > shouldClearLatestOnEmptyCache 
STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > shouldClearLatestOnEmptyCache 
PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldNotResetEpochHistoryHeadIfUndefinedPassed STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldNotResetEpochHistoryHeadIfUndefinedPassed PASSED

kafka.server.epoch.LeaderEpochIntegrationTest > 
shouldIncreaseLeaderEpochBetweenLeaderRestarts STARTED

kafka.server.epoch.LeaderEpochIntegrationTest > 
shouldIncreaseLeaderEpochBetweenLeaderRestarts PASSED

kafka.server.epoch.LeaderEpochIntegrationTest > 
shouldAddCurrentLeaderEpochToMessagesAsTheyAreWrittenToLeader STARTED

kafka.server.epoch.LeaderEpochIntegrationTest > 
shouldAddCurrentLeaderEpochToMessagesAsTheyAreWrittenToLeader PASSED

kafka.server.epoch.LeaderEpochIntegrationTest > 
shouldSendLeaderEpochRequestAndGetAResponse STARTED

kafka.server.epoch.LeaderEpochIntegrationTest > 
shouldSendLeaderEpochRequestAndGetAResponse PASSED

kafka.server.epoch.OffsetsForLeaderEpochTest > shouldGetEpochsFromReplica 
STARTED

kafka.server.epoch.OffsetsForLeaderEpochTest > shouldGetEpochsFromReplica PASSED

kafka.server.epoch.OffsetsForLeaderEpochTest > 
shouldReturnUnknownTopicOrPartitionIfThrown STARTED

kafka.server.epoch.OffsetsForLeaderEpochTest > 
shouldReturnUnknownTopicOrPartitionIfThrown PASSED

kafka.server.epoch.OffsetsForLeaderEpochTest > 
shouldReturnNoLeaderForPartitionIfThrown STARTED

kafka.server.epoch.OffsetsForLeaderEpochTest > 
shouldReturnNoLeaderForPartitionIfThrown PASSED

kafka.server.epoch.EpochDrivenReplicationProtocolAcceptanceTest > 
shouldSurviveFastLeaderChange STARTED

kafka.server.epoch.EpochDrivenReplicationProtocolAcceptance

Re: [DISCUSS] KIP-253: Support in-order message delivery with partition expansion

2018-04-03 Thread John Roesler
Hi Dong,

Thanks for the response. I think I'm broadly in agreement with your
statement that with KIP-253 implemented as written, KStreams would be able
to use partition expansion on its various internal topics and processor
threads.

As you say, the reason we can do this is that for internal topics, the
producer and consumer are in the same code base and therefore have full
visibility on the partition function. So the reason for backfill is not
primarily to support expanding KStreams internal topics, but rather to
support partition expansion when the producer and consumer do not already
share code.

Examples of cases where backfill would be useful:
* A KStreams app consumes a log-compacted input
* A KStreams app produces a log-compacted output, which is consumed by an
external stateful system via Kafka Consumer
* In general, stateful usages of Kafka Consumer on log-compacted topics
that were not produced in the same application

So, in conclusion, I remain in support of KIP-253, since it gives us "key
partitioning preservation" (my #4 in topic characteristics above) and
"single-key
ordering preservation" (my #6 above). With these in place, all use cases
can at least not break in the presence of partition expansion.

I'm advocating to add backfill, either in this KIP or a future one, only to
allow those three use cases to *benefit* from partition expansion. In other
words, to allow them to expand their capacity to match expanded partitions.

Thanks,
-John

On Tue, Apr 3, 2018 at 10:38 AM, Dong Lin  wrote:

> Hey John,
>
> Thanks much for your comments!!
>
> I have yet to go through the emails of John/Jun/Guozhang in detail. But let
> me present my idea for how to minimize the delay for state loading for
> stream use-case.
>
> For ease of understanding, let's assume that the initial partition number
> of input topics and change log topic are both 10. And initial number of
> stream processor is also 10. If we only increase initial partition number
> of input topics to 15 without changing number of stream processor, the
> current KIP already guarantees in-order delivery and no state needs to be
> moved between consumers for stream use-case. Next, let's say we want to
> increase the number of processor to expand the processing capacity for
> stream use-case. This requires us to move state between processors which
> will take time. Our goal is to minimize the impact (i.e. delay) for
> processing while we increase the number of processors.
>
> Note that stream processor generally includes both consumer and producer.
> In addition to consume from the input topic, consumer may also need to
> consume from change log topic on startup for recovery. And producer may
> produce state to the change log topic.
>
> The solution will include the following steps:
>
> 1) Increase partition number of the input topic from 10 to 15. Since the
> messages with the same key will still go to the same consume before and
> after the partition expansion, this step can be done without having to move
> state between processors.
>
> 2) Increase partition number of the change log topic from 10 to 15. Note
> that this step can also be done without impacting existing workflow. After
> we increase partition number of the change log topic, key space may split
> and some key will be produced to the newly-added partition. But the same
> key will still go to the same processor (i.e. consumer) before and after
> the partition. Thus this step can also be done without having to move state
> between processors.
>
> 3) Now, let's add 5 new consumers whose groupId is different from the
> existing processor's groupId. Thus these new consumers will not impact
> existing workflow. Each of these new consumers should consume two
> partitions from the earliest offset, where these two partitions are the
> same partitions that will be consumed if the consumers have the same
> groupId as the existing processor's groupId. For example, the first of the
> five consumers will consume partition 0 and partition 10. The purpose of
> these consumers is to rebuild the state (e.g. RocksDB) for the processors
> in advance. Also note that, by design of the current KIP, each consume will
> consume the existing partition of the change log topic up to the offset
> before the partition expansion. Then they will only need to consume the
> state of the new partition of the change log topic.
>
> 4) After consumers have caught up in step 3), we should stop these
> consumers and add 5 new processors to the stream processing job. These 5
> new processors should run in the same location as the previous 5 consumers
> to re-use the state (e.g. RocksDB). And these processors' consumers should
> consume partitions of the change log topic from the committed offset the
> previous 5 consumers so that no state is missed.
>
> One important trick to note here is that, the mapping from partition to
> consumer should also use linear hashing. And we need to remember the
> initial number of proces

Re: [DISCUSS] KIP-274: Kafka Streams Skipped Records Metrics

2018-04-03 Thread John Roesler
Allrighty, how about this, then...

I'll move the metric back to the StreamThread and maintain the existing tag
(client-id=...(per-thread client-id)). It won't be present in the
TopologyTestDriver's metrics.

As a side note, I'm not sure that the location of the log messages has
visibility into the name of the thread or the task, or the processor node,
for that matter. But at the end of the day, I don't think it really matters.

None of those identifiers are in the public interface or user-controlled.
For them to be useful for debugging, users would have to gain a very deep
understanding of how their DSL program gets executed. From my perspective,
they are all included in metric tags only to prevent collisions between the
same metrics in different (e.g.) threads.

I think what's important is to provide the right information in the logs
that users will be able to debug their issues. This is why the logs in my
pr include the topic/partition/offset of the offending data, as well as the
stacktrace of the exception from the deserializer (or for timestamps, the
extracted timestamp and the class name of their extractor). This
information alone should let them pinpoint the offending data and fix it.

(I am aware that that topic name might be a repartition topic, and
therefore also esoteric from the user's perspective, but I think it's the
best we can do right now. It might be nice to explicitly take on a
debugging ergonomics task in the future and give all processor nodes
human-friendly names. Then, we could surface these names in any logs or
exceptions. But I'm inclined to call this out-of-scope for now.)

Thanks again,
-John

On Tue, Apr 3, 2018 at 1:40 PM, Guozhang Wang  wrote:

> 1. If we can indeed gather all the context information from the log4j
> entries I'd suggest we change to thread-level (I'm not sure if that is
> doable, so if John have already some WIP PR that can help us decide).
>
> 2. We can consider adding the API in TopologyTestDriver for general testing
> purposes; that being said, I think Matthias has a good point that this
> alone should not be a driving motivation for us to keep this metric as
> task-level if 1) is true.
>
>
>
> Guozhang
>
>
> On Tue, Apr 3, 2018 at 11:36 AM, Matthias J. Sax 
> wrote:
>
> > Thanks Guozhang, that was my intent.
> >
> > @John: yes, we should not nail down the exact log message. It's just to
> > point out the trade-off. If we can get the required information in the
> > logs, we might not need task level metrics.
> >
> >
> > -Matthias
> >
> > On 4/3/18 11:26 AM, Guozhang Wang wrote:
> > > I think Matthias' comment is that, we can still record the metrics on
> the
> > > thread-level, while having the WARN log entry to include sufficient
> > context
> > > information so that users can still easily narrow down the
> investigation
> > > scope.
> > >
> > >
> > > Guozhang
> > >
> > > On Tue, Apr 3, 2018 at 11:22 AM, John Roesler 
> wrote:
> > >
> > >> I agree we should add as much information as is reasonable to the log.
> > For
> > >> example, see this WIP PR I started for this KIP:
> > >>
> > >> https://github.com/apache/kafka/pull/4812/files#diff-
> > >> 88d129f048bc842c7db5b2566a45fce8R80
> > >>
> > >> and
> > >>
> > >> https://github.com/apache/kafka/pull/4812/files#diff-
> > >> 69e6789eb675ec978a1abd24fed96eb1R111
> > >>
> > >> I'm not sure if we should nail down the log messages in the KIP or in
> > the
> > >> PR discussion. What say you?
> > >>
> > >> Thanks,
> > >> -John
> > >>
> > >> On Tue, Apr 3, 2018 at 12:20 AM, Matthias J. Sax <
> matth...@confluent.io
> > >
> > >> wrote:
> > >>
> > >>> Thanks for sharing your thoughts. As I mentioned originally, I am not
> > >>> sure about the right log level either. Your arguments are convincing
> --
> > >>> thus, I am fine with keeping WARN level.
> > >>>
> > >>> The task vs thread level argument is an interesting one. However, I
> am
> > >>> wondering if we should add this information into the corresponding
> WARN
> > >>> logs that we write anyway? For this case, we can also log the
> > >>> corresponding operator (and other information like topic name etc if
> > >>> needed). WDYT about this?
> > >>>
> > >>>
> > >>> -Matthias
> > >>>
> > >>> On 4/2/18 8:31 PM, Guozhang Wang wrote:
> >  Regarding logging: I'm inclined to keep logging at WARN level since
> > >>> skipped
> >  records are not expected in normal execution (for all reasons that
> we
> > >> are
> >  aware of), and hence when error happens users should be alerted from
> >  metrics and looked into the log files, so to me if it is really
> > >> spamming
> >  the log files it is also a good alert for users. Besides for
> > >> deserialize
> >  errors we already log at WARN level for this reason.
> > 
> >  Regarding the metrics-levels: I was pondering on that as well. What
> > >> made
> > >>> me
> >  to think and agree on task-level than thread-level is that for some
> > >>> reasons
> >  like window retention, they may poss

Build failed in Jenkins: kafka-trunk-jdk9 #528

2018-04-03 Thread Apache Jenkins Server
See 


Changes:

[jason] KAFKA-6739; Ignore headers when down-converting from V2 to V0/V1 (#4813)

--
[...truncated 1.48 MB...]
kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldUpdateSavedOffsetWhenOffsetToClearToIsBetweenEpochs STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldUpdateSavedOffsetWhenOffsetToClearToIsBetweenEpochs PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldNotResetEpochHistoryTailIfUndefinedPassed STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldNotResetEpochHistoryTailIfUndefinedPassed PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldReturnUnsupportedIfNoEpochRecorded STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldReturnUnsupportedIfNoEpochRecorded PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldRetainLatestEpochOnClearAllEarliest STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldRetainLatestEpochOnClearAllEarliest PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldPersistEpochsBetweenInstances STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldPersistEpochsBetweenInstances PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldNotClearAnythingIfOffsetToFirstOffset STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldNotClearAnythingIfOffsetToFirstOffset PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldNotLetOffsetsGoBackwardsEvenIfEpochsProgress STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldNotLetOffsetsGoBackwardsEvenIfEpochsProgress PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldGetFirstOffsetOfSubsequentEpochWhenOffsetRequestedForPreviousEpoch STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldGetFirstOffsetOfSubsequentEpochWhenOffsetRequestedForPreviousEpoch PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldUpdateOffsetBetweenEpochBoundariesOnClearEarliest2 STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldUpdateOffsetBetweenEpochBoundariesOnClearEarliest2 PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > shouldClearEarliestOnEmptyCache 
STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > shouldClearEarliestOnEmptyCache 
PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldPreserveResetOffsetOnClearEarliestIfOneExists STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldPreserveResetOffsetOnClearEarliestIfOneExists PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldUpdateOffsetBetweenEpochBoundariesOnClearEarliest STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldUpdateOffsetBetweenEpochBoundariesOnClearEarliest PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldReturnInvalidOffsetIfEpochIsRequestedWhichIsNotCurrentlyTracked STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldReturnInvalidOffsetIfEpochIsRequestedWhichIsNotCurrentlyTracked PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > shouldFetchEndOffsetOfEmptyCache 
STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > shouldFetchEndOffsetOfEmptyCache 
PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldRetainLatestEpochOnClearAllEarliestAndUpdateItsOffset STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldRetainLatestEpochOnClearAllEarliestAndUpdateItsOffset PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > shouldClearAllEntries STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > shouldClearAllEntries PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > shouldClearLatestOnEmptyCache 
STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > shouldClearLatestOnEmptyCache 
PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldNotResetEpochHistoryHeadIfUndefinedPassed STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldNotResetEpochHistoryHeadIfUndefinedPassed PASSED

kafka.server.epoch.LeaderEpochIntegrationTest > 
shouldIncreaseLeaderEpochBetweenLeaderRestarts STARTED

kafka.server.epoch.LeaderEpochIntegrationTest > 
shouldIncreaseLeaderEpochBetweenLeaderRestarts PASSED

kafka.server.epoch.LeaderEpochIntegrationTest > 
shouldAddCurrentLeaderEpochToMessagesAsTheyAreWrittenToLeader STARTED

kafka.server.epoch.LeaderEpochIntegrationTest > 
shouldAddCurrentLeaderEpochToMessagesAsTheyAreWrittenToLeader PASSED

kafka.server.epoch.LeaderEpochIntegrationTest > 
shouldSendLeaderEpochRequestAndGetAResponse STARTED

kafka.server.epoch.LeaderEpochIntegrationTest > 
shouldSendLeaderEpochRequestAndGetAResponse PASSED

kafka.server.epoch.OffsetsForLeaderEpochTest > shouldGetEpochsFromReplica 
STARTED

kafka.server.epoch.OffsetsForLeaderEpochTest > shouldGetEpochsFromReplica PASSED

kafka.server.epoch.OffsetsForLeaderEpochTest > 
shouldReturnUnknownTopicOrPartitionIfThrown STARTED

kafka.server.epoch.OffsetsForLeaderEpochTest > 
shouldReturnUnknownTopicOrPartitionIfThrown PASSED

kafka.server.epoch.OffsetsFor

[jira] [Created] (KAFKA-6745) kafka consumer rebalancing takes long time (from 3 secs to 5 minutes)

2018-04-03 Thread Ramkumar (JIRA)
Ramkumar created KAFKA-6745:
---

 Summary: kafka consumer rebalancing takes long time (from 3 secs 
to 5 minutes)
 Key: KAFKA-6745
 URL: https://issues.apache.org/jira/browse/KAFKA-6745
 Project: Kafka
  Issue Type: Improvement
  Components: clients, core
Affects Versions: 0.11.0.0
Reporter: Ramkumar


Hi, We had an HTTP service 3 nodes around Kafka 0.8 . This http service acts as 
a REST api for the publishers and consumers to use middleware intead of using 
kafka client api. Here the when the consumers rebalance is not a major issue.

We wanted to upgrade to kafka 0.11 , we have updated our http services (3 node 
cluster) to use new Kafka consumer API , but it takes rebalancing of consumer 
(multiple consumer under same Group) between secs to 5 mins 
(max.poll.interval.ms). Because of this time our http clients are timing out 
and do failover. This rebalancing time is major issue. It is not clear from the 
documentation ,that rebalance activity for the group takes place after 
max.poll.interval.ms  or it starts after 3 secs and complete any time with in 5 
minutes. We tried to reduce max.poll.interval.ms   to 15 seconds. but this also 
triggers rebalance internally.

Below are the other parameters we have set In our service
max.poll.interval.ms = 30 sec
 seconds heartbeat.interval.ms = 1
minute session.timeout.ms = 4
minutes consumer.cache.timeout = 2 min
 
 
below is the log
""2018-03-26 12:53:23,009 [qtp1404928347-11556] INFO  
org.apache.kafka.clients.consumer.internals.AbstractCoordinator - (Re-)joining 
group firstnetportal_001

""2018-03-26 12:57:52,793 [qtp1404928347-11556] INFO  
org.apache.kafka.clients.consumer.internals.AbstractCoordinator - Successfully 
joined group firstnetportal_001 with generation 7475

Please let me know if there are any other application/client use http interace 
in 3 nodes with out any having this  issue
 
 
 
 
 
 
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] KIP-274: Kafka Streams Skipped Records Metrics

2018-04-03 Thread Bill Bejeck
Hi John,

Thanks for making the updates.

I agree with the information you've included in the logs as described
above, as log statements without enough context/information can be
frustrating.

-Bill

On Tue, Apr 3, 2018 at 3:29 PM, John Roesler  wrote:

> Allrighty, how about this, then...
>
> I'll move the metric back to the StreamThread and maintain the existing tag
> (client-id=...(per-thread client-id)). It won't be present in the
> TopologyTestDriver's metrics.
>
> As a side note, I'm not sure that the location of the log messages has
> visibility into the name of the thread or the task, or the processor node,
> for that matter. But at the end of the day, I don't think it really
> matters.
>
> None of those identifiers are in the public interface or user-controlled.
> For them to be useful for debugging, users would have to gain a very deep
> understanding of how their DSL program gets executed. From my perspective,
> they are all included in metric tags only to prevent collisions between the
> same metrics in different (e.g.) threads.
>
> I think what's important is to provide the right information in the logs
> that users will be able to debug their issues. This is why the logs in my
> pr include the topic/partition/offset of the offending data, as well as the
> stacktrace of the exception from the deserializer (or for timestamps, the
> extracted timestamp and the class name of their extractor). This
> information alone should let them pinpoint the offending data and fix it.
>
> (I am aware that that topic name might be a repartition topic, and
> therefore also esoteric from the user's perspective, but I think it's the
> best we can do right now. It might be nice to explicitly take on a
> debugging ergonomics task in the future and give all processor nodes
> human-friendly names. Then, we could surface these names in any logs or
> exceptions. But I'm inclined to call this out-of-scope for now.)
>
> Thanks again,
> -John
>
> On Tue, Apr 3, 2018 at 1:40 PM, Guozhang Wang  wrote:
>
> > 1. If we can indeed gather all the context information from the log4j
> > entries I'd suggest we change to thread-level (I'm not sure if that is
> > doable, so if John have already some WIP PR that can help us decide).
> >
> > 2. We can consider adding the API in TopologyTestDriver for general
> testing
> > purposes; that being said, I think Matthias has a good point that this
> > alone should not be a driving motivation for us to keep this metric as
> > task-level if 1) is true.
> >
> >
> >
> > Guozhang
> >
> >
> > On Tue, Apr 3, 2018 at 11:36 AM, Matthias J. Sax 
> > wrote:
> >
> > > Thanks Guozhang, that was my intent.
> > >
> > > @John: yes, we should not nail down the exact log message. It's just to
> > > point out the trade-off. If we can get the required information in the
> > > logs, we might not need task level metrics.
> > >
> > >
> > > -Matthias
> > >
> > > On 4/3/18 11:26 AM, Guozhang Wang wrote:
> > > > I think Matthias' comment is that, we can still record the metrics on
> > the
> > > > thread-level, while having the WARN log entry to include sufficient
> > > context
> > > > information so that users can still easily narrow down the
> > investigation
> > > > scope.
> > > >
> > > >
> > > > Guozhang
> > > >
> > > > On Tue, Apr 3, 2018 at 11:22 AM, John Roesler 
> > wrote:
> > > >
> > > >> I agree we should add as much information as is reasonable to the
> log.
> > > For
> > > >> example, see this WIP PR I started for this KIP:
> > > >>
> > > >> https://github.com/apache/kafka/pull/4812/files#diff-
> > > >> 88d129f048bc842c7db5b2566a45fce8R80
> > > >>
> > > >> and
> > > >>
> > > >> https://github.com/apache/kafka/pull/4812/files#diff-
> > > >> 69e6789eb675ec978a1abd24fed96eb1R111
> > > >>
> > > >> I'm not sure if we should nail down the log messages in the KIP or
> in
> > > the
> > > >> PR discussion. What say you?
> > > >>
> > > >> Thanks,
> > > >> -John
> > > >>
> > > >> On Tue, Apr 3, 2018 at 12:20 AM, Matthias J. Sax <
> > matth...@confluent.io
> > > >
> > > >> wrote:
> > > >>
> > > >>> Thanks for sharing your thoughts. As I mentioned originally, I am
> not
> > > >>> sure about the right log level either. Your arguments are
> convincing
> > --
> > > >>> thus, I am fine with keeping WARN level.
> > > >>>
> > > >>> The task vs thread level argument is an interesting one. However, I
> > am
> > > >>> wondering if we should add this information into the corresponding
> > WARN
> > > >>> logs that we write anyway? For this case, we can also log the
> > > >>> corresponding operator (and other information like topic name etc
> if
> > > >>> needed). WDYT about this?
> > > >>>
> > > >>>
> > > >>> -Matthias
> > > >>>
> > > >>> On 4/2/18 8:31 PM, Guozhang Wang wrote:
> > >  Regarding logging: I'm inclined to keep logging at WARN level
> since
> > > >>> skipped
> > >  records are not expected in normal execution (for all reasons that
> > we
> > > >> are
> > >  aware of), and hence when error happens users sh

Re: [DISCUSS] KIP-276: Add StreamsConfig prefix for different consumers

2018-04-03 Thread Bill Bejeck
Boyang,

Thanks for making changes to the KIP,  I'm +1 on the updated version.

-Bill

On Tue, Apr 3, 2018 at 1:14 AM, Boyang Chen  wrote:

> Hey friends,
>
>
> both KIP 276+Add+StreamsConfig+prefix+for+different+consumers> and pull request<
> https://github.com/apache/kafka/pull/4805> are updated. Feel free to take
> another look.
>
>
>
> Thanks for your valuable feedback!
>
>
> Best,
>
> Boyang
>
> 
> From: Boyang Chen 
> Sent: Tuesday, April 3, 2018 11:39 AM
> To: dev@kafka.apache.org
> Subject: Re: [DISCUSS] KIP-276: Add StreamsConfig prefix for different
> consumers
>
> Thanks Matthias, Ted and Guozhang for the inputs. I shall address them in
> next round.
>
>
> 
> From: Matthias J. Sax 
> Sent: Tuesday, April 3, 2018 4:43 AM
> To: dev@kafka.apache.org
> Subject: Re: [DISCUSS] KIP-276: Add StreamsConfig prefix for different
> consumers
>
> Yes, your examples make sense to me. That was the idea behind the proposal.
>
>
> -Matthias
>
> On 4/2/18 11:25 AM, Guozhang Wang wrote:
> > @Matthias
> >
> > That's a good question: I think adding another config for the main
> consumer
> > makes good tradeoffs between compatibility and new user convenience. Just
> > to clarify, from user's pov on upgrade:
> >
> > 1) I'm already overriding some consumer configs, and now I want to
> override
> > these values differently for restore consumers, I'd add one new line for
> > the restore consumer prefix.
> >
> > 2) I'm already overriding some consumer configs, and now I want to NOT
> > overriding them for restore consumers, I'd change my override from
> > `consumer.X` to `main.consumer.X`.
> >
> > 3) I'm new and have not any consumer overrides, and now if I want to
> > override some, I'd use `main.consumer`, `restore.consumer` for specific
> > consumer types, and ONLY consider `consumer` for the ones that I want to
> > apply universally.
> >
> > 4) I'm already overriding some consumer configs and I'm happy with what I
> > get, I do not change anything.
> >
> >
> > Guozhang
> >
> > On Mon, Apr 2, 2018 at 11:10 AM, Ted Yu  wrote:
> >
> >> bq. to introduce one more prefix `main.consumer.`
> >>
> >> Makes sense.
> >>
> >> On Mon, Apr 2, 2018 at 11:06 AM, Matthias J. Sax  >
> >> wrote:
> >>
> >>> Boyang,
> >>>
> >>> thanks a lot for the KIP.
> >>>
> >>> Couple of questions:
> >>>
>  (MODIFIED) public Map getRestoreConsumerConfigs(final
> >>> String clientId);
> >>>
> >>> You mean that the implementation/semantics of this method changes, but
> >>> not the API itself? Might be good to add this as "comment style"
> instead
> >>> of "(MODIFIED)" prefix.
> >>>
>  By rewriting the getRestoreConsumerConfigs() function and adding the
> >>> getGlobalConsumerConfigs() function, if one user uses
> >>> restoreConsumerPrefix() or globalConsumerPrefix() when adding new
> >>> configurations, the configs shall overwrite base consumer config. If
> not
> >>> specified, restore consumer and global consumer shall share the same
> >> config
> >>> with base consumer.
> >>>
> >>> While this does make sense for backward compatibility, I am wonder if
> it
> >>> makes the config "inheritance logic" (ie, hierarchy) too complex? We
> >>> basically introduce a second level of overwrites. It might be simpler
> to
> >>> not introduce this hierarchy with the cost to break backward
> >> compatibility.
> >>>
> >>> For example, config `request.timeout.ms`:
> >>>
> >>> User sets `request.timeout.ms=`
> >>> To change it for the main consumer, user also sets
> >>> `consumer.request.timeout.ms=`
> >>>
> >>> If user only wants to change the config for main consumer, but not for
> >>> global or restore consumer, user needs to add two more configs:
> >>>
> >>> `restore.consumer.request.timeout.ms=`
> >>> and
> >>> `global.consumer.request.timeout.ms=`
> >>>
> >>> to reset both back to the default config. IMHO, this is not an optimal
> >>> user experience. Thus, it might be worth to change the semantics for
> >>> `consumer.` prefix to only apply those configs to the main consumer.
> >>>
> >>>
> >>> Not sure what other think what the better solution is (I am not sure by
> >>> myself to be honest---just wanted to point it out and discuss the
> >>> pros/cons for both).
> >>>
> >>>
> >>> Another though would be, to introduce one more prefix `main.consumer.`
> >>> -- using this, the existing `consumer.` prefix would apply to all
> >>> consumers (keeping it's current semantics) while we have overwrites for
> >>> all three consumers -- this allow to directly set `main.consumer`
> >>> instead of `consumer` avoiding the weird pattern from my example above
> >>> and preserves backward compatibility. Ie, if we introduce an hierarchy
> >>> of overwrite, a "full" hierarchy might be better than a "partial"
> >>> hierarchy.
> >>>
> >>>
> >>> Looking forward to your thoughts.
> >>>
> >>>
> >>> -Matthias
> >>>
> >>>
> >>> On 4/1/18 5:55 PM, Guozhang Wang wrot

Build failed in Jenkins: kafka-trunk-jdk8 #2522

2018-04-03 Thread Apache Jenkins Server
See 


Changes:

[me] MINOR: Make kafka-streams-test-utils dependencies work with releases

[me] KAFKA-6728: Corrected the worker’s instantiation of the HeaderConverter

[wangguoz] MINOR: Refactor return value (#4810)

--
[...truncated 3.95 MB...]

org.apache.kafka.streams.kstream.internals.SessionWindowedKStreamImplTest > 
shouldThrowNullPointerOnMaterializedAggregateIfInitializerIsNull PASSED

org.apache.kafka.streams.kstream.internals.SessionWindowedKStreamImplTest > 
shouldThrowNullPointerOnMaterializedAggregateIfMaterializedIsNull STARTED

org.apache.kafka.streams.kstream.internals.SessionWindowedKStreamImplTest > 
shouldThrowNullPointerOnMaterializedAggregateIfMaterializedIsNull PASSED

org.apache.kafka.streams.kstream.internals.SessionWindowedKStreamImplTest > 
shouldThrowNullPointerOnMaterializedAggregateIfMergerIsNull STARTED

org.apache.kafka.streams.kstream.internals.SessionWindowedKStreamImplTest > 
shouldThrowNullPointerOnMaterializedAggregateIfMergerIsNull PASSED

org.apache.kafka.streams.kstream.internals.SessionWindowedKStreamImplTest > 
shouldMaterializeCount STARTED

org.apache.kafka.streams.kstream.internals.SessionWindowedKStreamImplTest > 
shouldMaterializeCount PASSED

org.apache.kafka.streams.kstream.internals.SessionWindowedKStreamImplTest > 
shouldMaterializeWithoutSpecifyingSerdes STARTED

org.apache.kafka.streams.kstream.internals.SessionWindowedKStreamImplTest > 
shouldMaterializeWithoutSpecifyingSerdes PASSED

org.apache.kafka.streams.kstream.internals.SessionWindowedKStreamImplTest > 
shouldMaterializeAggregated STARTED

org.apache.kafka.streams.kstream.internals.SessionWindowedKStreamImplTest > 
shouldMaterializeAggregated PASSED

org.apache.kafka.streams.kstream.internals.SessionWindowedKStreamImplTest > 
shouldThrowNullPointerOnCountIfMaterializedIsNull STARTED

org.apache.kafka.streams.kstream.internals.SessionWindowedKStreamImplTest > 
shouldThrowNullPointerOnCountIfMaterializedIsNull PASSED

org.apache.kafka.streams.kstream.internals.SessionWindowedKStreamImplTest > 
shouldThrowNullPointerOnAggregateIfAggregatorIsNull STARTED

org.apache.kafka.streams.kstream.internals.SessionWindowedKStreamImplTest > 
shouldThrowNullPointerOnAggregateIfAggregatorIsNull PASSED

org.apache.kafka.streams.kstream.internals.SessionWindowedKStreamImplTest > 
shouldThrowNullPointerOnReduceIfReducerIsNull STARTED

org.apache.kafka.streams.kstream.internals.SessionWindowedKStreamImplTest > 
shouldThrowNullPointerOnReduceIfReducerIsNull PASSED

org.apache.kafka.streams.kstream.internals.SessionWindowedKStreamImplTest > 
shouldAggregateSessionWindowed STARTED

org.apache.kafka.streams.kstream.internals.SessionWindowedKStreamImplTest > 
shouldAggregateSessionWindowed PASSED

org.apache.kafka.streams.kstream.internals.SessionWindowedKStreamImplTest > 
shouldCountSessionWindowed STARTED

org.apache.kafka.streams.kstream.internals.SessionWindowedKStreamImplTest > 
shouldCountSessionWindowed PASSED

org.apache.kafka.streams.kstream.internals.SessionWindowedKStreamImplTest > 
shouldReduceWindowed STARTED

org.apache.kafka.streams.kstream.internals.SessionWindowedKStreamImplTest > 
shouldReduceWindowed PASSED

org.apache.kafka.streams.kstream.internals.SessionWindowedKStreamImplTest > 
shouldThrowNullPointerOnMaterializedAggregateIfAggregatorIsNull STARTED

org.apache.kafka.streams.kstream.internals.SessionWindowedKStreamImplTest > 
shouldThrowNullPointerOnMaterializedAggregateIfAggregatorIsNull PASSED

org.apache.kafka.streams.kstream.internals.SessionWindowedKStreamImplTest > 
shouldThrowNullPointerOnAggregateIfInitializerIsNull STARTED

org.apache.kafka.streams.kstream.internals.SessionWindowedKStreamImplTest > 
shouldThrowNullPointerOnAggregateIfInitializerIsNull PASSED

org.apache.kafka.streams.kstream.internals.SessionWindowedKStreamImplTest > 
shouldThrowNullPointerOnAggregateIfMergerIsNull STARTED

org.apache.kafka.streams.kstream.internals.SessionWindowedKStreamImplTest > 
shouldThrowNullPointerOnAggregateIfMergerIsNull PASSED

org.apache.kafka.streams.kstream.internals.SessionWindowedKStreamImplTest > 
shouldThrowNullPointerOnMaterializedReduceIfMaterializedIsNull STARTED

org.apache.kafka.streams.kstream.internals.SessionWindowedKStreamImplTest > 
shouldThrowNullPointerOnMaterializedReduceIfMaterializedIsNull PASSED

org.apache.kafka.streams.kstream.internals.SessionWindowedKStreamImplTest > 
shouldMaterializeReduced STARTED

org.apache.kafka.streams.kstream.internals.SessionWindowedKStreamImplTest > 
shouldMaterializeReduced PASSED

org.apache.kafka.streams.kstream.internals.KStreamGlobalKTableLeftJoinTest > 
shouldClearGlobalTableEntryOnNullValueUpdates STARTED

org.apache.kafka.streams.kstream.internals.KStreamGlobalKTableLeftJoinTest > 
shouldClearGlobalTableEntryOnNullValueUpdates PASSED

org.apache.kafka.streams.kstream.internals.KStreamGlobalKTableLeftJoinTest > 
sh

Re: Gradle strategy for exposing and using public test-utils modules

2018-04-03 Thread John Roesler
Hello again all,

It turns out that the implementation I provided was not correct after all.

The issue was gradle tracked the compiled source files included via

> compile project(':streams').sourceSets.main.output

and included them in the release tarball.

Ewen found the issue and corrected the project to use "compileOnly"
instead, which uses the dependency to compile the project but does not
export the dependency:
https://github.com/apache/kafka/pull/4816

Although Ewen's patch does produce the desired output, I think we'd rather
have test-utils export a regular dependency on the main streams module. But
we can apply the same strategy to allow the streams tests to pull in
test-utils as a compile-only dependency:

> testCompileOnly project(':streams:test-utils')


Here's a PR with the new proposed strategy:
https://github.com/apache/kafka/pull/4821. Feel free to comment.

For those wishing for a summary, here's the new proposed strategy:

project(':streams') {
> ...
> dependencies {
> ...
>
// this breaks the dependency cycle:

testCompileOnly project(':streams:test-utils')
> ...
> }
> ...
> }




> project(':streams:test-utils') {
> ...
> dependencies {
> compile project(':streams')
> ...
> }
> ...
> }




> project(':streams:examples') {
> ...
> dependencies {
> compile project(':streams')
> ...
> testCompile project(':streams:test-utils')
> ...
> }
> ...
> }


Thanks,
-John

On Tue, Mar 27, 2018 at 7:06 PM, John Roesler  wrote:

> Hi again everyone,
>
> Just for the sake of closure, I think everyone is generally in agreement
> with this approach. If concerns arise later on, please let me know!
>
> Thanks,
> -John
>
> On Fri, Mar 23, 2018 at 12:41 AM, zhenya Sun  wrote:
>
>> +1
>> > 在 2018年3月23日,下午12:20,Ted Yu  写道:
>> >
>> > +1
>> >  Original message From: "Matthias J. Sax" <
>> matth...@confluent.io> Date: 3/22/18  9:07 PM  (GMT-08:00) To:
>> dev@kafka.apache.org Subject: Re: Gradle strategy for exposing and using
>> public test-utils modules
>> > +1 from my side.
>> >
>> > -Matthias
>> >
>> > On 3/22/18 5:12 PM, John Roesler wrote:
>> >> Yep, I'm super happy with this approach vs. a third module just for the
>> >> tests.
>> >>
>> >> For clairty, here's a PR demonstrating the model we're proposing:
>> >> https://github.com/apache/kafka/pull/4760
>> >>
>> >> Thanks,
>> >> -John
>> >>
>> >> On Thu, Mar 22, 2018 at 6:21 PM, Guozhang Wang 
>> wrote:
>> >>
>> >>> I'm +1 to the approach as well across modules that are going to have
>> test
>> >>> utils artifacts in the future. To me this seems to be a much smaller
>> change
>> >>> we can make to break the circular dependencies than creating a new
>> package
>> >>> for our own testing code.
>> >>>
>> >>> Guozhang
>> >>>
>> >>> On Thu, Mar 22, 2018 at 1:26 PM, Bill Bejeck 
>> wrote:
>> >>>
>>  John,
>> 
>>  Thanks for the clear, detailed explanation.
>> 
>>  I'm +1 on what you have proposed.
>>  While I agree with you manually pulling in transitive test
>> dependencies
>> >>> is
>>  not ideal, in this case, I think it's worth it to get over the
>> circular
>>  dependency hurdle and use streams:test-utils ourselves.
>> 
>>  -Bill
>> 
>>  On Thu, Mar 22, 2018 at 4:09 PM, John Roesler 
>> wrote:
>> 
>> > Hey everyone,
>> >
>> > In 1.1, kafka-streams adds an artifact called
>> >>> 'kafka-streams-test-utils'
>> > (see
>> > https://kafka.apache.org/11/documentation/streams/
>> > developer-guide/testing.html
>> > ).
>> >
>> > The basic idea is to provide first-class support for testing Kafka
>>  Streams
>> > applications. Without that, users were forced to either depend on
>> our
>> > internal test artifacts or develop their own test utilities,
>> neither of
>> > which is ideal.
>> >
>> > I think it would be great if all our APIs offered a similar module,
>> and
>>  it
>> > would all be good if we followed a similar pattern, so I'll describe
>> >>> the
>> > streams approach along with one challenge we had to overcome:
>> >
>> > =
>> > = Project Structure =
>> > =
>> >
>> > The directory structure goes:
>> >
>> > kafka/streams/ <- main module code here
>> >   /test-utils/  <- test utilities module here
>> >   /examples/<- example usages here
>> >
>> > Likewise, the artifacts are:
>> >
>> > kafka-streams
>> > kafka-streams-test-utils
>> > kafka-streams-examples
>> >
>> > And finally, the Gradle build structure is:
>> >
>> > :streams
>> > :streams:test-utils
>> > :streams:examples
>> >
>> >
>> > =
>> > = Problem 1: circular build =
>> > =
>> >
>> > In eat-yo

Re: [DISCUSS] KIP-274: Kafka Streams Skipped Records Metrics

2018-04-03 Thread Guozhang Wang
Thanks John, your proposal looks fine to me.

I'll go ahead and look into the PR for more details myself.


Guozhang

On Tue, Apr 3, 2018 at 1:35 PM, Bill Bejeck  wrote:

> Hi John,
>
> Thanks for making the updates.
>
> I agree with the information you've included in the logs as described
> above, as log statements without enough context/information can be
> frustrating.
>
> -Bill
>
> On Tue, Apr 3, 2018 at 3:29 PM, John Roesler  wrote:
>
> > Allrighty, how about this, then...
> >
> > I'll move the metric back to the StreamThread and maintain the existing
> tag
> > (client-id=...(per-thread client-id)). It won't be present in the
> > TopologyTestDriver's metrics.
> >
> > As a side note, I'm not sure that the location of the log messages has
> > visibility into the name of the thread or the task, or the processor
> node,
> > for that matter. But at the end of the day, I don't think it really
> > matters.
> >
> > None of those identifiers are in the public interface or user-controlled.
> > For them to be useful for debugging, users would have to gain a very deep
> > understanding of how their DSL program gets executed. From my
> perspective,
> > they are all included in metric tags only to prevent collisions between
> the
> > same metrics in different (e.g.) threads.
> >
> > I think what's important is to provide the right information in the logs
> > that users will be able to debug their issues. This is why the logs in my
> > pr include the topic/partition/offset of the offending data, as well as
> the
> > stacktrace of the exception from the deserializer (or for timestamps, the
> > extracted timestamp and the class name of their extractor). This
> > information alone should let them pinpoint the offending data and fix it.
> >
> > (I am aware that that topic name might be a repartition topic, and
> > therefore also esoteric from the user's perspective, but I think it's the
> > best we can do right now. It might be nice to explicitly take on a
> > debugging ergonomics task in the future and give all processor nodes
> > human-friendly names. Then, we could surface these names in any logs or
> > exceptions. But I'm inclined to call this out-of-scope for now.)
> >
> > Thanks again,
> > -John
> >
> > On Tue, Apr 3, 2018 at 1:40 PM, Guozhang Wang 
> wrote:
> >
> > > 1. If we can indeed gather all the context information from the log4j
> > > entries I'd suggest we change to thread-level (I'm not sure if that is
> > > doable, so if John have already some WIP PR that can help us decide).
> > >
> > > 2. We can consider adding the API in TopologyTestDriver for general
> > testing
> > > purposes; that being said, I think Matthias has a good point that this
> > > alone should not be a driving motivation for us to keep this metric as
> > > task-level if 1) is true.
> > >
> > >
> > >
> > > Guozhang
> > >
> > >
> > > On Tue, Apr 3, 2018 at 11:36 AM, Matthias J. Sax <
> matth...@confluent.io>
> > > wrote:
> > >
> > > > Thanks Guozhang, that was my intent.
> > > >
> > > > @John: yes, we should not nail down the exact log message. It's just
> to
> > > > point out the trade-off. If we can get the required information in
> the
> > > > logs, we might not need task level metrics.
> > > >
> > > >
> > > > -Matthias
> > > >
> > > > On 4/3/18 11:26 AM, Guozhang Wang wrote:
> > > > > I think Matthias' comment is that, we can still record the metrics
> on
> > > the
> > > > > thread-level, while having the WARN log entry to include sufficient
> > > > context
> > > > > information so that users can still easily narrow down the
> > > investigation
> > > > > scope.
> > > > >
> > > > >
> > > > > Guozhang
> > > > >
> > > > > On Tue, Apr 3, 2018 at 11:22 AM, John Roesler 
> > > wrote:
> > > > >
> > > > >> I agree we should add as much information as is reasonable to the
> > log.
> > > > For
> > > > >> example, see this WIP PR I started for this KIP:
> > > > >>
> > > > >> https://github.com/apache/kafka/pull/4812/files#diff-
> > > > >> 88d129f048bc842c7db5b2566a45fce8R80
> > > > >>
> > > > >> and
> > > > >>
> > > > >> https://github.com/apache/kafka/pull/4812/files#diff-
> > > > >> 69e6789eb675ec978a1abd24fed96eb1R111
> > > > >>
> > > > >> I'm not sure if we should nail down the log messages in the KIP or
> > in
> > > > the
> > > > >> PR discussion. What say you?
> > > > >>
> > > > >> Thanks,
> > > > >> -John
> > > > >>
> > > > >> On Tue, Apr 3, 2018 at 12:20 AM, Matthias J. Sax <
> > > matth...@confluent.io
> > > > >
> > > > >> wrote:
> > > > >>
> > > > >>> Thanks for sharing your thoughts. As I mentioned originally, I am
> > not
> > > > >>> sure about the right log level either. Your arguments are
> > convincing
> > > --
> > > > >>> thus, I am fine with keeping WARN level.
> > > > >>>
> > > > >>> The task vs thread level argument is an interesting one.
> However, I
> > > am
> > > > >>> wondering if we should add this information into the
> corresponding
> > > WARN
> > > > >>> logs that we write anyway? For this case, we can also log the
> >

Build failed in Jenkins: kafka-trunk-jdk8 #2523

2018-04-03 Thread Apache Jenkins Server
See 


Changes:

[jason] KAFKA-6739; Ignore headers when down-converting from V2 to V0/V1 (#4813)

--
[...truncated 422.24 KB...]

kafka.controller.ReplicaStateMachineTest > 
testInvalidOfflineReplicaToReplicaDeletionIneligibleTransition PASSED

kafka.controller.ReplicaStateMachineTest > 
testNewReplicaToOfflineReplicaTransition STARTED

kafka.controller.ReplicaStateMachineTest > 
testNewReplicaToOfflineReplicaTransition PASSED

kafka.controller.ReplicaStateMachineTest > 
testInvalidReplicaDeletionSuccessfulToOnlineReplicaTransition STARTED

kafka.controller.ReplicaStateMachineTest > 
testInvalidReplicaDeletionSuccessfulToOnlineReplicaTransition PASSED

kafka.controller.ReplicaStateMachineTest > 
testInvalidNonexistentReplicaToReplicaDeletionStartedTransition STARTED

kafka.controller.ReplicaStateMachineTest > 
testInvalidNonexistentReplicaToReplicaDeletionStartedTransition PASSED

kafka.controller.ReplicaStateMachineTest > 
testInvalidOnlineReplicaToReplicaDeletionSuccessfulTransition STARTED

kafka.controller.ReplicaStateMachineTest > 
testInvalidOnlineReplicaToReplicaDeletionSuccessfulTransition PASSED

kafka.controller.ReplicaStateMachineTest > 
testInvalidOfflineReplicaToNonexistentReplicaTransition STARTED

kafka.controller.ReplicaStateMachineTest > 
testInvalidOfflineReplicaToNonexistentReplicaTransition PASSED

kafka.controller.ReplicaStateMachineTest > 
testInvalidOnlineReplicaToReplicaDeletionIneligibleTransition STARTED

kafka.controller.ReplicaStateMachineTest > 
testInvalidOnlineReplicaToReplicaDeletionIneligibleTransition PASSED

kafka.controller.ReplicaStateMachineTest > 
testInvalidReplicaDeletionSuccessfulToReplicaDeletionStartedTransition STARTED

kafka.controller.ReplicaStateMachineTest > 
testInvalidReplicaDeletionSuccessfulToReplicaDeletionStartedTransition PASSED

kafka.controller.ReplicaStateMachineTest > 
testInvalidNewReplicaToReplicaDeletionSuccessfulTransition STARTED

kafka.controller.ReplicaStateMachineTest > 
testInvalidNewReplicaToReplicaDeletionSuccessfulTransition PASSED

kafka.controller.ReplicaStateMachineTest > 
testInvalidReplicaDeletionIneligibleToReplicaDeletionStartedTransition STARTED

kafka.controller.ReplicaStateMachineTest > 
testInvalidReplicaDeletionIneligibleToReplicaDeletionStartedTransition PASSED

kafka.controller.ReplicaStateMachineTest > 
testInvalidReplicaDeletionStartedToOfflineReplicaTransition STARTED

kafka.controller.ReplicaStateMachineTest > 
testInvalidReplicaDeletionStartedToOfflineReplicaTransition PASSED

kafka.controller.ReplicaStateMachineTest > 
testInvalidNewReplicaToReplicaDeletionStartedTransition STARTED

kafka.controller.ReplicaStateMachineTest > 
testInvalidNewReplicaToReplicaDeletionStartedTransition PASSED

kafka.controller.ReplicaStateMachineTest > 
testOnlineReplicaToOnlineReplicaTransition STARTED

kafka.controller.ReplicaStateMachineTest > 
testOnlineReplicaToOnlineReplicaTransition PASSED

kafka.controller.ReplicaStateMachineTest > 
testInvalidNewReplicaToReplicaDeletionIneligibleTransition STARTED

kafka.controller.ReplicaStateMachineTest > 
testInvalidNewReplicaToReplicaDeletionIneligibleTransition PASSED

kafka.controller.ReplicaStateMachineTest > 
testInvalidNonexistentReplicaToOfflineReplicaTransition STARTED

kafka.controller.ReplicaStateMachineTest > 
testInvalidNonexistentReplicaToOfflineReplicaTransition PASSED

kafka.controller.ReplicaStateMachineTest > 
testReplicaDeletionStartedToReplicaDeletionSuccessfulTransition STARTED

kafka.controller.ReplicaStateMachineTest > 
testReplicaDeletionStartedToReplicaDeletionSuccessfulTransition PASSED

kafka.controller.ReplicaStateMachineTest > 
testInvalidNonexistentReplicaToOnlineReplicaTransition STARTED

kafka.controller.ReplicaStateMachineTest > 
testInvalidNonexistentReplicaToOnlineReplicaTransition PASSED

kafka.controller.ReplicaStateMachineTest > 
testInvalidOnlineReplicaToNewReplicaTransition STARTED

kafka.controller.ReplicaStateMachineTest > 
testInvalidOnlineReplicaToNewReplicaTransition PASSED

kafka.controller.ReplicaStateMachineTest > 
testReplicaDeletionStartedToReplicaDeletionIneligibleTransition STARTED

kafka.controller.ReplicaStateMachineTest > 
testReplicaDeletionStartedToReplicaDeletionIneligibleTransition PASSED

kafka.controller.ReplicaStateMachineTest > 
testInvalidReplicaDeletionSuccessfulToOfflineReplicaTransition STARTED

kafka.controller.ReplicaStateMachineTest > 
testInvalidReplicaDeletionSuccessfulToOfflineReplicaTransition PASSED

kafka.controller.ReplicaStateMachineTest > 
testInvalidReplicaDeletionIneligibleToNewReplicaTransition STARTED

kafka.controller.ReplicaStateMachineTest > 
testInvalidReplicaDeletionIneligibleToNewReplicaTransition PASSED

kafka.controller.ReplicaStateMachineTest > 
testReplicaDeletionIneligibleToOnlineReplicaTransition STARTED

kafka.controller.ReplicaStateMachineTest > 
testReplicaDeletionIneligibleT

Re: [DISCUSS] KIP-274: Kafka Streams Skipped Records Metrics

2018-04-03 Thread Matthias J. Sax
Sounds great!

The cryptic topic names can be an issue -- however, people can
`describe()`  their topology to map the name to the corresponding
sub-topology/tasks to narrow the error down to the corresponding
operators. I think, this should be "sufficient for now" for debugging.

Renaming those topic seems to be out-of-scope for this KIP.


-Matthias

On 4/3/18 2:45 PM, Guozhang Wang wrote:
> Thanks John, your proposal looks fine to me.
> 
> I'll go ahead and look into the PR for more details myself.
> 
> 
> Guozhang
> 
> On Tue, Apr 3, 2018 at 1:35 PM, Bill Bejeck  wrote:
> 
>> Hi John,
>>
>> Thanks for making the updates.
>>
>> I agree with the information you've included in the logs as described
>> above, as log statements without enough context/information can be
>> frustrating.
>>
>> -Bill
>>
>> On Tue, Apr 3, 2018 at 3:29 PM, John Roesler  wrote:
>>
>>> Allrighty, how about this, then...
>>>
>>> I'll move the metric back to the StreamThread and maintain the existing
>> tag
>>> (client-id=...(per-thread client-id)). It won't be present in the
>>> TopologyTestDriver's metrics.
>>>
>>> As a side note, I'm not sure that the location of the log messages has
>>> visibility into the name of the thread or the task, or the processor
>> node,
>>> for that matter. But at the end of the day, I don't think it really
>>> matters.
>>>
>>> None of those identifiers are in the public interface or user-controlled.
>>> For them to be useful for debugging, users would have to gain a very deep
>>> understanding of how their DSL program gets executed. From my
>> perspective,
>>> they are all included in metric tags only to prevent collisions between
>> the
>>> same metrics in different (e.g.) threads.
>>>
>>> I think what's important is to provide the right information in the logs
>>> that users will be able to debug their issues. This is why the logs in my
>>> pr include the topic/partition/offset of the offending data, as well as
>> the
>>> stacktrace of the exception from the deserializer (or for timestamps, the
>>> extracted timestamp and the class name of their extractor). This
>>> information alone should let them pinpoint the offending data and fix it.
>>>
>>> (I am aware that that topic name might be a repartition topic, and
>>> therefore also esoteric from the user's perspective, but I think it's the
>>> best we can do right now. It might be nice to explicitly take on a
>>> debugging ergonomics task in the future and give all processor nodes
>>> human-friendly names. Then, we could surface these names in any logs or
>>> exceptions. But I'm inclined to call this out-of-scope for now.)
>>>
>>> Thanks again,
>>> -John
>>>
>>> On Tue, Apr 3, 2018 at 1:40 PM, Guozhang Wang 
>> wrote:
>>>
 1. If we can indeed gather all the context information from the log4j
 entries I'd suggest we change to thread-level (I'm not sure if that is
 doable, so if John have already some WIP PR that can help us decide).

 2. We can consider adding the API in TopologyTestDriver for general
>>> testing
 purposes; that being said, I think Matthias has a good point that this
 alone should not be a driving motivation for us to keep this metric as
 task-level if 1) is true.



 Guozhang


 On Tue, Apr 3, 2018 at 11:36 AM, Matthias J. Sax <
>> matth...@confluent.io>
 wrote:

> Thanks Guozhang, that was my intent.
>
> @John: yes, we should not nail down the exact log message. It's just
>> to
> point out the trade-off. If we can get the required information in
>> the
> logs, we might not need task level metrics.
>
>
> -Matthias
>
> On 4/3/18 11:26 AM, Guozhang Wang wrote:
>> I think Matthias' comment is that, we can still record the metrics
>> on
 the
>> thread-level, while having the WARN log entry to include sufficient
> context
>> information so that users can still easily narrow down the
 investigation
>> scope.
>>
>>
>> Guozhang
>>
>> On Tue, Apr 3, 2018 at 11:22 AM, John Roesler 
 wrote:
>>
>>> I agree we should add as much information as is reasonable to the
>>> log.
> For
>>> example, see this WIP PR I started for this KIP:
>>>
>>> https://github.com/apache/kafka/pull/4812/files#diff-
>>> 88d129f048bc842c7db5b2566a45fce8R80
>>>
>>> and
>>>
>>> https://github.com/apache/kafka/pull/4812/files#diff-
>>> 69e6789eb675ec978a1abd24fed96eb1R111
>>>
>>> I'm not sure if we should nail down the log messages in the KIP or
>>> in
> the
>>> PR discussion. What say you?
>>>
>>> Thanks,
>>> -John
>>>
>>> On Tue, Apr 3, 2018 at 12:20 AM, Matthias J. Sax <
 matth...@confluent.io
>>
>>> wrote:
>>>
 Thanks for sharing your thoughts. As I mentioned originally, I am
>>> not
 sure about the right log level either. Your arguments are
>>> convincing
 --
 thus, I am fine with keeping 

Jenkins build is back to normal : kafka-trunk-jdk9 #529

2018-04-03 Thread Apache Jenkins Server
See 




Re: [VOTE] KIP-249: Add Delegation Token Operations to Kafka Admin Client

2018-04-03 Thread Manikumar
Hi All,

The vote has passed with 3 binding votes (Jun, Gwen, Rajini) and 2
non-binding votes (Satish, Viktor).

Thanks everyone for the votes.

Thanks,
Manikumar

On Tue, Apr 3, 2018 at 2:46 PM, Rajini Sivaram 
wrote:

> +1 (binding)
>
> Thanks for the KIP, Manikumar!
>
> On Thu, Mar 29, 2018 at 5:34 PM, Gwen Shapira  wrote:
>
> > +1
> >
> > Thank you and sorry for missing it the first time around.
> >
> > On Thu, Mar 29, 2018 at 3:05 AM, Manikumar 
> > wrote:
> >
> > > I'm bumping this up to get some attention.
> > >
> > >
> > > On Wed, Jan 24, 2018 at 3:36 PM, Satish Duggana <
> > satish.dugg...@gmail.com>
> > > wrote:
> > >
> > > >  +1, thanks for the KIP.
> > > >
> > > > ~Satish.
> > > >
> > > > On Wed, Jan 24, 2018 at 5:09 AM, Jun Rao  wrote:
> > > >
> > > > > Hi, Mani,
> > > > >
> > > > > Thanks for the KIP. +1
> > > > >
> > > > > Jun
> > > > >
> > > > > On Sun, Jan 21, 2018 at 7:44 AM, Manikumar <
> > manikumar.re...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi All,
> > > > > >
> > > > > > I would like to start a vote on KIP-249 which would add
> delegation
> > > > token
> > > > > > operations
> > > > > > to Java Admin Client.
> > > > > >
> > > > > > We have merged DelegationToken API PR recently. We want to
> include
> > > > admin
> > > > > > client changes in the upcoming release. This will make the
> feature
> > > > > > complete.
> > > > > >
> > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > 249%3A+Add+Delegation+Token+Operations+to+KafkaAdminClient
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > *Gwen Shapira*
> > Product Manager | Confluent
> > 650.450.2760 | @gwenshap
> > Follow us: Twitter  | blog
> > 
> >
>


Jenkins build is back to normal : kafka-trunk-jdk8 #2524

2018-04-03 Thread Apache Jenkins Server
See