[DISCUSS] KIP-267: Add Processor Unit Test Support to Kafka Streams Test Utils

2018-03-07 Thread John Roesler
Dear Kafka community,

I am proposing KIP-267 to augment the public Streams test utils API.
The goal is to simplify testing of Kafka Streams applications.

Please find details in the
wiki:https://cwiki.apache.org/confluence/display/KAFKA/KIP-267%3A+Add+Processor+Unit+Test+Support+to+Kafka+Streams+Test+Utils

An initial WIP PR can be found here:https://github.com/apache/kafka/pull/4662

I also included the user-list (please hit "reply-all" to include both
lists in this KIP discussion).

Thanks,

-John


Re: [DISCUSS] KIP-267: Add Processor Unit Test Support to Kafka Streams Test Utils

2018-03-07 Thread John Roesler
Thanks Ted,

Sure thing; I updated the example code in the KIP with a little snippet.

-John

On Wed, Mar 7, 2018 at 7:18 PM, Ted Yu  wrote:

> Looks good.
>
> See if you can add punctuator into the sample code.
>
> On Wed, Mar 7, 2018 at 7:10 PM, John Roesler  wrote:
>
> > Dear Kafka community,
> >
> > I am proposing KIP-267 to augment the public Streams test utils API.
> > The goal is to simplify testing of Kafka Streams applications.
> >
> > Please find details in the
> > wiki:https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 267%3A+Add+Processor+Unit+Test+Support+to+Kafka+Streams+Test+Utils
> >
> > An initial WIP PR can be found here:https://github.com/
> > apache/kafka/pull/4662
> >
> > I also included the user-list (please hit "reply-all" to include both
> > lists in this KIP discussion).
> >
> > Thanks,
> >
> > -John
> >
>


Re: [DISCUSS] KIP-267: Add Processor Unit Test Support to Kafka Streams Test Utils

2018-03-07 Thread John Roesler
On Wed, Mar 7, 2018 at 8:03 PM, John Roesler  wrote:

> Thanks Ted,
>
> Sure thing; I updated the example code in the KIP with a little snippet.
>
> -John
>
> On Wed, Mar 7, 2018 at 7:18 PM, Ted Yu  wrote:
>
>> Looks good.
>>
>> See if you can add punctuator into the sample code.
>>
>> On Wed, Mar 7, 2018 at 7:10 PM, John Roesler  wrote:
>>
>> > Dear Kafka community,
>> >
>> > I am proposing KIP-267 to augment the public Streams test utils API.
>> > The goal is to simplify testing of Kafka Streams applications.
>> >
>> > Please find details in the
>> > wiki:https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> > 267%3A+Add+Processor+Unit+Test+Support+to+Kafka+Streams+Test+Utils
>> >
>> > An initial WIP PR can be found here:https://github.com/
>> > apache/kafka/pull/4662
>> >
>> > I also included the user-list (please hit "reply-all" to include both
>> > lists in this KIP discussion).
>> >
>> > Thanks,
>> >
>> > -John
>> >
>>
>
>


Re: [DISCUSS] KIP-267: Add Processor Unit Test Support to Kafka Streams Test Utils

2018-03-08 Thread John Roesler
Thanks for the review, Guozhang,

In response:
1. I missed that! I'll look into it and update the KIP.

2. I was planning to use the real implementation, since folks might
register some metrics in the processors and want to verify the values that
get recorded. If the concern is about initializing all the stuff that's in
the Metrics object, I can instantiate it lazily or even make it optional by
taking a nullable constructor parameter.

3. Agreed. I think that's the real sharp edge here. I actually think it
would be neat to auto-trigger those scheduled punctuators, but it seems
like that moves this component out of "mock" territory and into "driver"
territory. Since we already have the TopologyTestDriver, I'd prefer to
focus on keeping the mock lean. I agree it should be in the javadoc as well
as the web documentation.

Thanks,
-John

On Thu, Mar 8, 2018 at 1:46 PM, Guozhang Wang  wrote:

> Hello John,
>
> Thanks for the KIP. I made a pass over the wiki page and here are some
> comments:
>
> 1. Meta-comment: there is an internal class MockProcessorContext under the
> o.a.k.test package, which should be replaced as part of this KIP.
>
> 2. In @Override StreamsMetrics metrics(), will you return a fully created
> StreamsMetricsImpl object or are you planning to use the
> MockStreamsMetrics? Note that for the latter case you probably need to look
> into https://issues.apache.org/jira/browse/KAFKA-5676 as well.
>
> 3. Not related to the KIP changes themselves: about
> "context.scheduledPunctuators": we need to well document that in the
> MockProcessorContext the scheduled punctuator will never by auto-triggered,
> and hence it is only for testing people's code that some punctuators are
> indeed registered, and if people want full auto punctuation testing they
> have to go with TopologyTestDriver.
>
>
>
> Guozhang
>
>
> On Wed, Mar 7, 2018 at 8:04 PM, John Roesler  wrote:
>
> > On Wed, Mar 7, 2018 at 8:03 PM, John Roesler  wrote:
> >
> > > Thanks Ted,
> > >
> > > Sure thing; I updated the example code in the KIP with a little
> snippet.
> > >
> > > -John
> > >
> > > On Wed, Mar 7, 2018 at 7:18 PM, Ted Yu  wrote:
> > >
> > >> Looks good.
> > >>
> > >> See if you can add punctuator into the sample code.
> > >>
> > >> On Wed, Mar 7, 2018 at 7:10 PM, John Roesler 
> wrote:
> > >>
> > >> > Dear Kafka community,
> > >> >
> > >> > I am proposing KIP-267 to augment the public Streams test utils API.
> > >> > The goal is to simplify testing of Kafka Streams applications.
> > >> >
> > >> > Please find details in the
> > >> > wiki:https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > >> > 267%3A+Add+Processor+Unit+Test+Support+to+Kafka+Streams+Test+Utils
> > >> >
> > >> > An initial WIP PR can be found here:https://github.com/
> > >> > apache/kafka/pull/4662
> > >> >
> > >> > I also included the user-list (please hit "reply-all" to include
> both
> > >> > lists in this KIP discussion).
> > >> >
> > >> > Thanks,
> > >> >
> > >> > -John
> > >> >
> > >>
> > >
> > >
> >
>
>
>
> --
> -- Guozhang
>


Re: [DISCUSS] KIP-267: Add Processor Unit Test Support to Kafka Streams Test Utils

2018-03-08 Thread John Roesler
Actually, replacing the MockProcessorContext in o.a.k.test could be a bit
tricky, since it would make the "streams" module depend on
"streams:test-utils", but "streams:test-utils" already depends on "streams".

At first glance, it seems like the options are:
1. leave the two separate implementations in place. This shouldn't be
underestimated, especially since our internal tests may need different
things from a mocked P.C. than our API users.
2. move the public testing artifacts into the regular streams module
3. move the unit tests for Streams into a third module that depends on both
streams and test-utils. Yuck!

Thanks,
-John

On Thu, Mar 8, 2018 at 3:16 PM, John Roesler  wrote:

> Thanks for the review, Guozhang,
>
> In response:
> 1. I missed that! I'll look into it and update the KIP.
>
> 2. I was planning to use the real implementation, since folks might
> register some metrics in the processors and want to verify the values that
> get recorded. If the concern is about initializing all the stuff that's in
> the Metrics object, I can instantiate it lazily or even make it optional by
> taking a nullable constructor parameter.
>
> 3. Agreed. I think that's the real sharp edge here. I actually think it
> would be neat to auto-trigger those scheduled punctuators, but it seems
> like that moves this component out of "mock" territory and into "driver"
> territory. Since we already have the TopologyTestDriver, I'd prefer to
> focus on keeping the mock lean. I agree it should be in the javadoc as well
> as the web documentation.
>
> Thanks,
> -John
>
> On Thu, Mar 8, 2018 at 1:46 PM, Guozhang Wang  wrote:
>
>> Hello John,
>>
>> Thanks for the KIP. I made a pass over the wiki page and here are some
>> comments:
>>
>> 1. Meta-comment: there is an internal class MockProcessorContext under the
>> o.a.k.test package, which should be replaced as part of this KIP.
>>
>> 2. In @Override StreamsMetrics metrics(), will you return a fully created
>> StreamsMetricsImpl object or are you planning to use the
>> MockStreamsMetrics? Note that for the latter case you probably need to
>> look
>> into https://issues.apache.org/jira/browse/KAFKA-5676 as well.
>>
>> 3. Not related to the KIP changes themselves: about
>> "context.scheduledPunctuators": we need to well document that in the
>> MockProcessorContext the scheduled punctuator will never by
>> auto-triggered,
>> and hence it is only for testing people's code that some punctuators are
>> indeed registered, and if people want full auto punctuation testing they
>> have to go with TopologyTestDriver.
>>
>>
>>
>> Guozhang
>>
>>
>> On Wed, Mar 7, 2018 at 8:04 PM, John Roesler  wrote:
>>
>> > On Wed, Mar 7, 2018 at 8:03 PM, John Roesler  wrote:
>> >
>> > > Thanks Ted,
>> > >
>> > > Sure thing; I updated the example code in the KIP with a little
>> snippet.
>> > >
>> > > -John
>> > >
>> > > On Wed, Mar 7, 2018 at 7:18 PM, Ted Yu  wrote:
>> > >
>> > >> Looks good.
>> > >>
>> > >> See if you can add punctuator into the sample code.
>> > >>
>> > >> On Wed, Mar 7, 2018 at 7:10 PM, John Roesler 
>> wrote:
>> > >>
>> > >> > Dear Kafka community,
>> > >> >
>> > >> > I am proposing KIP-267 to augment the public Streams test utils
>> API.
>> > >> > The goal is to simplify testing of Kafka Streams applications.
>> > >> >
>> > >> > Please find details in the
>> > >> > wiki:https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> > >> > 267%3A+Add+Processor+Unit+Test+Support+to+Kafka+Streams+Test+Utils
>> > >> >
>> > >> > An initial WIP PR can be found here:https://github.com/
>> > >> > apache/kafka/pull/4662
>> > >> >
>> > >> > I also included the user-list (please hit "reply-all" to include
>> both
>> > >> > lists in this KIP discussion).
>> > >> >
>> > >> > Thanks,
>> > >> >
>> > >> > -John
>> > >> >
>> > >>
>> > >
>> > >
>> >
>>
>>
>>
>> --
>> -- Guozhang
>>
>
>


Re: [DISCUSS] KIP-267: Add Processor Unit Test Support to Kafka Streams Test Utils

2018-03-08 Thread John Roesler
Thanks, Matthias,

1. I can move it into the o.a.k.streams.processor package; that makes sense.

2. I'm expecting most users to use in-memory state stores, so they won't
need a state directory. In the "real" code path, the stateDir is extracted
from the config
by org.apache.kafka.streams.processor.internals.StateDirectory. The logic
is non-trivial and invoking it directly will result in the state directory
actually being created. Given my assumption that you don't need it most of
the time, creating directories seems too heavy to me.

3. I'm on the fence about that. It's not too much trouble to implement it,
even if it is deprecated from day 1, so I think I'd rather put it in and
let us remove it later when we actually remove the deprecated method. In
contrast, we actually would have to jump through some hoops to support
schedule(interval).

On Thu, Mar 8, 2018 at 3:36 PM, Matthias J. Sax 
wrote:

> Thanks for the KIP John.
>
> Couple of minor questions:
>
> - What about putting the mock into sub-package `processor` so it's in
> the same package name as the interface it implements?
>
> - What is the purpose of the constructor talking the `File stateDir`
> argument? The state directory should be encoded in the `Properties
> config' parameter already.
>
> - We have KIP-251 that place (not voted yet though) that plans to
> deprecate `forward(K key, V value, int childIndex)` and `forward(K key,
> V value, String childName)`  -- should we also throw
> UnsupportedOperationException similar to `schedule(long)` if KIP-251 is
> accepted?
>
>
> -Matthias
>
> On 3/8/18 3:16 PM, John Roesler wrote:
> > Thanks for the review, Guozhang,
> >
> > In response:
> > 1. I missed that! I'll look into it and update the KIP.
> >
> > 2. I was planning to use the real implementation, since folks might
> > register some metrics in the processors and want to verify the values
> that
> > get recorded. If the concern is about initializing all the stuff that's
> in
> > the Metrics object, I can instantiate it lazily or even make it optional
> by
> > taking a nullable constructor parameter.
> >
> > 3. Agreed. I think that's the real sharp edge here. I actually think it
> > would be neat to auto-trigger those scheduled punctuators, but it seems
> > like that moves this component out of "mock" territory and into "driver"
> > territory. Since we already have the TopologyTestDriver, I'd prefer to
> > focus on keeping the mock lean. I agree it should be in the javadoc as
> well
> > as the web documentation.
> >
> > Thanks,
> > -John
> >
> > On Thu, Mar 8, 2018 at 1:46 PM, Guozhang Wang 
> wrote:
> >
> >> Hello John,
> >>
> >> Thanks for the KIP. I made a pass over the wiki page and here are some
> >> comments:
> >>
> >> 1. Meta-comment: there is an internal class MockProcessorContext under
> the
> >> o.a.k.test package, which should be replaced as part of this KIP.
> >>
> >> 2. In @Override StreamsMetrics metrics(), will you return a fully
> created
> >> StreamsMetricsImpl object or are you planning to use the
> >> MockStreamsMetrics? Note that for the latter case you probably need to
> look
> >> into https://issues.apache.org/jira/browse/KAFKA-5676 as well.
> >>
> >> 3. Not related to the KIP changes themselves: about
> >> "context.scheduledPunctuators": we need to well document that in the
> >> MockProcessorContext the scheduled punctuator will never by
> auto-triggered,
> >> and hence it is only for testing people's code that some punctuators are
> >> indeed registered, and if people want full auto punctuation testing they
> >> have to go with TopologyTestDriver.
> >>
> >>
> >>
> >> Guozhang
> >>
> >>
> >> On Wed, Mar 7, 2018 at 8:04 PM, John Roesler  wrote:
> >>
> >>> On Wed, Mar 7, 2018 at 8:03 PM, John Roesler 
> wrote:
> >>>
> >>>> Thanks Ted,
> >>>>
> >>>> Sure thing; I updated the example code in the KIP with a little
> >> snippet.
> >>>>
> >>>> -John
> >>>>
> >>>> On Wed, Mar 7, 2018 at 7:18 PM, Ted Yu  wrote:
> >>>>
> >>>>> Looks good.
> >>>>>
> >>>>> See if you can add punctuator into the sample code.
> >>>>>
> >>>>> On Wed, Mar 7, 2018 at 7:10 PM, John Roesler 
> >> wrote:
> >>>>>
> >>>>>> Dear Kafka community,
> >>>>>>
> >>>>>> I am proposing KIP-267 to augment the public Streams test utils API.
> >>>>>> The goal is to simplify testing of Kafka Streams applications.
> >>>>>>
> >>>>>> Please find details in the
> >>>>>> wiki:https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >>>>>> 267%3A+Add+Processor+Unit+Test+Support+to+Kafka+Streams+Test+Utils
> >>>>>>
> >>>>>> An initial WIP PR can be found here:https://github.com/
> >>>>>> apache/kafka/pull/4662
> >>>>>>
> >>>>>> I also included the user-list (please hit "reply-all" to include
> >> both
> >>>>>> lists in this KIP discussion).
> >>>>>>
> >>>>>> Thanks,
> >>>>>>
> >>>>>> -John
> >>>>>>
> >>>>>
> >>>>
> >>>>
> >>>
> >>
> >>
> >>
> >> --
> >> -- Guozhang
> >>
> >
>
>


Re: [DISCUSS] KIP-267: Add Processor Unit Test Support to Kafka Streams Test Utils

2018-03-08 Thread John Roesler
I think what you're suggesting is to:
1. compile the main streams code, but not the tests
2. compile test-utils (and compile and run the test-utils tests)
3. compile and run the streams tests

This works in theory, since the test-utils depends on the main streams
code, but not the streams tests. and the streams tests depend on test-utils
while the main streams code does not.

But after poking around a bit and reading up on it, I think this is not
possible, or at least not mainstream.

The issue is that dependencies are formed between projects, in this case
streams and streams:test-utils. The upstream project must be built before
the dependant one, regardless of whether the dependency is for compiling
the main code or the test code. This means we do have a circular dependency
on our hands if we want the tests in streams to use the test-utils, since
they'd both have to be built before the other.

Gradle seems to be quite scriptable, so there may be some way to achieve
this, but increasing the complexity of the build also introduces a project
maintenance concern.

The MockProcessorContext itself is pretty simple, so I'm tempted to argue
that we should just have one for internal unit tests and another for
test-utils, however this situation also afflicts KAFKA-6474
<https://issues.apache.org/jira/browse/KAFKA-6474>, and the
TopologyTestDriver is not so trivial.

I think the best thing at this point is to go ahead and fold the test-utils
into the streams project. We can put it into a separate "testutils" package
to make it easy to identify which code is for test support and which code
is Kafka Streams. The biggest bummer about this suggestion is that it we
*just* introduced the test-utils artifact, so folks would to add that
artifact in 1.1 to write their tests and then have to drop it again in 1.2.

The other major solution is to create a new gradle project for the streams
unit tests, which depends on streams and test-utils and move all the
streams unit tests there. I'm pretty sure we can configure gradle just to
include this project for running tests and not actually package any
artifacts. This structure basically expresses your observation that the
test code is essentially a separate module from the main streams code.

Of course, I'm open to alternatives, especially if someone with more
experience in Gradle is aware of a solution.

Thanks,
-John


On Thu, Mar 8, 2018 at 3:39 PM, Matthias J. Sax 
wrote:

> Isn't MockProcessorContext in o.a.k.test part of the unit-test package
> but not the main package?
>
> This should resolve the dependency issue.
>
> -Matthias
>
> On 3/8/18 3:32 PM, John Roesler wrote:
> > Actually, replacing the MockProcessorContext in o.a.k.test could be a bit
> > tricky, since it would make the "streams" module depend on
> > "streams:test-utils", but "streams:test-utils" already depends on
> "streams".
> >
> > At first glance, it seems like the options are:
> > 1. leave the two separate implementations in place. This shouldn't be
> > underestimated, especially since our internal tests may need different
> > things from a mocked P.C. than our API users.
> > 2. move the public testing artifacts into the regular streams module
> > 3. move the unit tests for Streams into a third module that depends on
> both
> > streams and test-utils. Yuck!
> >
> > Thanks,
> > -John
> >
> > On Thu, Mar 8, 2018 at 3:16 PM, John Roesler  wrote:
> >
> >> Thanks for the review, Guozhang,
> >>
> >> In response:
> >> 1. I missed that! I'll look into it and update the KIP.
> >>
> >> 2. I was planning to use the real implementation, since folks might
> >> register some metrics in the processors and want to verify the values
> that
> >> get recorded. If the concern is about initializing all the stuff that's
> in
> >> the Metrics object, I can instantiate it lazily or even make it
> optional by
> >> taking a nullable constructor parameter.
> >>
> >> 3. Agreed. I think that's the real sharp edge here. I actually think it
> >> would be neat to auto-trigger those scheduled punctuators, but it seems
> >> like that moves this component out of "mock" territory and into "driver"
> >> territory. Since we already have the TopologyTestDriver, I'd prefer to
> >> focus on keeping the mock lean. I agree it should be in the javadoc as
> well
> >> as the web documentation.
> >>
> >> Thanks,
> >> -John
> >>
> >> On Thu, Mar 8, 2018 at 1:46 PM, Guozhang Wang 
> wrote:
> >>
> >>> Hello John,
> >>>
> >>> Thanks for the KIP. 

Re: Delayed processing

2018-03-09 Thread John Roesler
Hi Wim,

One off-the-cuff idea is that you maybe don't need to actually delay
anonymizing the data. Instead, you can just create a separate pathway that
immediately anonymizes the data. Something like this:

(raw-input topic, GDPR retention period set)
 |\->[streams apps that needs non-anonymized data]
 |
 \->[anonymizer app]->(anonymized-input topic, arbitrary retention
period)->[any apps that can handle anonymous data]

This way, you can keep your anonymized-input topic forever if you want,
using log compaction. You also get a very clean data lineage, so you know
for sure which components are viewing non-anonymized data (because they
consume the raw-input topic) versus the ones that are "safe" according to
GDPR, since they consume only the anonymized-input topic.

For downstream apps that are happy using anonymized input, it seems like it
wouldn't matter whether the input is anonymized right away or 6 months
delayed.

And since you know clearly which components may be "dirty" because they are
downstream of raw-input, you can audit those components and make sure they
are either stateless or that they also have proper retention periods set.

Of course, without knowing the details, I can't say whether this actually
works for you, but I wanted to share the thought.

Thanks,
-John

On Thu, Mar 8, 2018 at 11:21 PM, adrien ruffie 
wrote:

> Hi Wim,
> this topic (processing order) has been cropping up for a while, several
> article, benchmark, and other think on the subject
> reaching this conclusion.
>
> After that you can, you can ask someone else another opinion on the
> subject.
>
> regards,
>
> Adrien
>
> 
> De : Wim Van Leuven 
> Envoyé : vendredi 9 mars 2018 08:03:15
> À : users@kafka.apache.org
> Objet : Re: Delayed processing
>
> Hey Adrien,
>
> thank you for the elaborate explanation!
>
> We are ingesting call data records here, which due to the nature of a telco
> network might not arrive in absolute logical order.
>
> If I understand your explanation correctly, you are saying that with your
> setup, Kafka guarantees the processing in order of ingestion of the
> messages. Correct?
>
> Thanks!
> -wim
>
> On Thu, 8 Mar 2018 at 22:58 adrien ruffie 
> wrote:
>
> > Hello Wim,
> >
> >
> > does it matter (I think), because one of the big and principal features
> of
> > Kafka is:
> >
> > Kafka is to do load balancing of messages and guarantee ordering in a
> > distributed cluster.
> >
> >
> > The order of the messages should be guaranteed, unless several cases:
> >
> > 1] Producer can cause data loss when, block.on.buffer.full = false,
> > retries are exhausted and sending message without using acks=all
> >
> > 2] unclean leader election enable: because if one of follower (out of
> > sync) become the new leader, messages that were not synced to the new
> >
> > leader are lost.
> >
> >
> > message reordering might happen when:
> >
> > 1] max.in.flight.requests.per.connection > 1 and retries are enabled
> >
> > 2] when a producer is not correclty closed like, without calling .close()
> >
> > Because close method allowing to ensure that accumulator is closed first
> > to guarantee that no more appends are accepted after breaking the send
> loop.
> >
> >
> >
> > If you wan't to avoir these cases:
> >
> > - close producer in the callback error
> >
> > - close producer with close(0) to prevent sending after previous message
> > send failed
> >
> >
> > Avoid data loss:
> >
> > - block.on.buffer.fill=TRUE
> >
> > - retries=Long.MAX_VALUE
> >
> > - acks=all
> >
> >
> > Avoid reordering:
> >
> > max.in.flight.request.per.connection=1 (be aware about latency)
> >
> >
> > take attention about, if your producer is down, messages in buffer will
> > still be lost ... (perhaps manage a local storage if you are punctilious)
> >
> > moreover at least two replicas are nedded at any time to guarantee data
> > persistence. example replication factor = 3, min.isr = 2 , unclean leader
> > election disabled
> >
> >
> > Also keep in mind that consumer can lose message when offsets are not
> > correctly commited. Disable auto.offset.commit and commit offsets only
> > after make your job for each message (or commit several processed
> messages
> > at one time and kept in a local memory buffer)
> >
> >
> > I hope, these previous suggestions help you 😊
> >
> >
> > Best regards,
> >
> > Adrien
> >
> > 
> > De : Wim Van Leuven 
> > Envoyé : jeudi 8 mars 2018 21:35:13
> > À : users@kafka.apache.org
> > Objet : Delayed processing
> >
> > Hello,
> >
> > I'm wondering how to design a KStreams or regular Kafka application that
> > can hold of processing of messages until a future time.
> >
> > This related to EU's data protection regulation: we can store raw
> messages
> > for a given time; afterwards we have to store the anonymised message.
> So, I
> > was thinking about branching the stream, anonymise the messages into a
> > waiting topic and than continue from there until the re

Re: [DISCUSS] KIP-267: Add Processor Unit Test Support to Kafka Streams Test Utils

2018-03-09 Thread John Roesler
Sweet! I think this pretty much wraps up all the discussion points.

I'll update the KIP with all the relevant aspects we discussed and call for
a vote.

I'll also comment on the TopologyTestDriver ticket noting this modular test
strategy.

Thanks, everyone.
-John

On Fri, Mar 9, 2018 at 10:57 AM, Guozhang Wang  wrote:

> Hey John,
>
> Re: Mock Processor Context:
>
> That's a good point, I'm convinced that we should keep them as two classes.
>
>
> Re: test-utils module:
>
> I think I agree with your proposed changes, in fact in order to not scatter
> the test classes in two places maybe it's better to move all of them to the
> new module. One caveat is that it will make streams' project hierarchy
> inconsistent with other projects where the unit test classes are maintained
> inside the main artifact package, but I think it is a good cost to pay,
> plus once we start publishing test-util artifacts for other projects like
> client and connect, we may face the same issue and need to do this
> refactoring as well.
>
>
>
> Guozhang
>
>
>
>
> On Fri, Mar 9, 2018 at 9:54 AM, John Roesler  wrote:
>
> > Hi Guozhang and Bill,
> >
> > I'll summarize what I'm currently thinking in light of all the
> discussion:
> >
> > Mock Processor Context:
> > ===
> >
> > Here's how I see the use cases for the two mocks differing:
> >
> > 1. o.a.k.test.MPC: Crafted for testing Streams use cases. Implements
> > AbstractProcessorContext, actually forward to child processor nodes,
> allow
> > restoring a state store. Most importantly, the freedom to do stuff
> > convenient for our tests without impacting anyone.
> >
> > 2. (test-utils) MPC: Crafted for testing community Processors (and
> > friends). Very flat and simple implementation (so people can read it in
> one
> > sitting); i.e., doesn't drag in other data models like RecordContext.
> Test
> > one processor in isolation, so generally don't bother with complex logic
> > like scheduling punctuators, forwarding results, or restoring state
> stores.
> > Most importantly, an API that can be stable.
> >
> > So, I really am leaning toward keeping both implementations. I like
> Bill's
> > suggestion of renaming the unit testing class to
> > InternalMockProcessorContext, since having classes with the same name in
> > different packages is confusing. I look forward to the day when Java 9
> > takes off and we can actually hide internal classes from the public
> > interface.
> >
> > test-utils module:
> > =
> >
> > This is actually out of scope for this KIP if we keep both MPC
> > implementations, but it has been a major feature of this discussion, so
> we
> > may as well see it though.
> >
> > I've waffled a bit on this point, but right now I would propose we
> > restructure the streams directory thusly:
> >
> > streams/ (artifact name := "streams", the actual streams code lives here)
> > - test-utils/ (this is the current test-utils artifact, depends on
> > "streams")
> > - tests/ (new module, depends on "streams" and "test-utils", *NO
> published
> > artifact*)
> >
> > This gets us out of the circular dependency without having to engage in
> any
> > Gradle shenanigans while preserving "test-utils" as a separate artifact.
> > This is good because: 1) the test-utils don't need to be in production
> > code, so it's nice to have a separate artifact, 2) test-utils is already
> > public in 1.1, and it's a bummer to introduce users' code when we can so
> > easily avoid it.
> >
> > Note, though, that if we agree to keep both MPC implementations, then
> this
> > really is just important for rewriting our tests to use
> TopologyTestDriver,
> > and in fact only the tests that need it should move to "streams/tests/".
> >
> > What say you?
> >
> > -John
> >
> > On Fri, Mar 9, 2018 at 9:01 AM, Guozhang Wang 
> wrote:
> >
> > > Hmm.. it seems to be a general issue then, since we were planning to
> also
> > > replace the KStreamTestDriver and ProcessorTopologyTestDriver with the
> > new
> > > TopologyTestDriver soon, so if the argument that testing dependency
> could
> > > still cause circular dependencies holds it means we cannot do that as
> > well.
> > >
> > > My understanding on gradle dependencies has been that test dependencies
> > are
> > > not required to comp

Re: Workaround for KTable/KTable join followed by groupBy and aggregate/count can result in duplicated results KAFKA-4609?

2018-04-30 Thread John Roesler
Hello Artur,

Apologies in advance if I say something incorrect, as I'm still a little
new to this project.

If I followed your example, then I think the scenario is that you're
joining "claims" and "payments", grouping by "claimNumber", and then
building a list for each "claimNumber" of all the claim/payment pairs. Is
that right?

It's not in your example, but the grouped stream or table for "claims"
(claimStrGrouped) and "payments" (paymentGrouped) must be keyed with the
same key, right? In that case, the result of their join will also be keyed
by that same key.

It seems like the problem you're seeing is that that list contains the same
claim/payment pair multiple times for a given claimNumber. Did I get it?

In that case, I don't know if what you're seeing is the same issue Damian
reported in KAFKA-4609, since the problem he reported was that there was no
deduping cache after the join, only before it, unless you register a state
store representing the join itself. In your case, it looks like you do
register a state store representing the join, the
"CLAIM_AND_PAYMENT_JOIN_STORE".
So you will have a cache that can dedup the join result.

Note that the join itself is what causes duplicates, not the subsequent "
claimAndPaymentKTable.toStream()". For example, if I see input like this:

(left stream):
t1: k1 -> L1
t3: k1 -> L1

(right stream):
t2: k1 -> R1

Then, without deduplication, the resulting join would be:
(left.join(right) stream):
t1: k1 -> (L1, null)
t2: k1 -> (L1, R1)
t3: k1 -> (L1, R1)

Note that we see apparently duplicate join results, but really the meaning
of the join stream is that "as of right now, this is the value for this
key", so from the join's perspective it's not wrong to say "as of t2, k1's
value is (L1, R1)" and then to say it at t3 again.

In Kafka Streams, there is a deduplication cache which can reduce such
duplicate events, but without unbounded memory, the cache can't guarantee
to remove all duplicates, so it's important to deal with the join result in
a semantically robust way.

I think this also contains the key to resolving your issue; inside your
aggregator, instead of storing a list of *every event*, I think you'll want
to store a map of the *latest event by key*. (This would be the key that's
common to claimStrGrouped, paymentGrouped, and claimAndPaymentKTable). This
way, you'll automatically overwrite old, obsolete, join results with new
ones for the same key (whether or not the old result happens to be the same
as the new one).

Does this help?
-John

On Mon, Apr 30, 2018 at 1:19 AM, Artur Mrozowski  wrote:

> Hi,
> a while ago I hit KAFKA-4609 when running a simple pipeline. I have two
> KTable joins followed by group by and aggregate on and KStream  and one
> additional join. Now this KTable/KTable join followed by group by  and
> aggregated genereates duplicates.
>
>
>
> I wonder if a possible workaround would be to remove the KStream after
> KTable/KTable join and make groupBy and aggregate  on the KTable?
>
>
>  KTable customerAndPolicyGroupedKTable =
> customerGrouped.leftJoin(policyGrouped,(customer, policy) -> new
> CustomerAndPolicy(customer,policy));
>
>KTable claimAndPaymentKTable =
> claimStrGrouped.leftJoin(paymentGrouped,(claim,payment) -> new
> ClaimAndPayment(claim,payment),claimAndPaymentSerde,CLAIM_
> AND_PAYMENT_JOIN_STORE);
>
>
>  * KStream claimAndPaymentKStream =
> claimAndPaymentKTable.toStream(); //Can we remove this and avoid
> KAFKA-4609?*
>
>KTable claimAndPayment2IntGroupedTable =
> claimAndPaymentKStream
>.groupBy((k,claimPay) ->
>(claimPay.claimList != null ) ?
>
> Integer.parseInt(claimPay.claimList.claimRecords.get(0).
> claimnumber.split("_")[0])
> :  999,integerSerde,claimAndPaymentSerde )
>.aggregate(
> ClaimAndPayment2::new,
> (claimKey,claimPay,claimAndPay2) -> {
>
>
> claimAndPay2.claimAndPaymentList.add(claimPay);
>
> return claimAndPay2;
>
> }
> ,claimAndPayment2Serde
> ,CLAIM_AND_PAYMENT_STORE
> );
>
>
>
>
>
> Best regards
> Artur Mrozowski
>


Re: Workaround for KTable/KTable join followed by groupBy and aggregate/count can result in duplicated results KAFKA-4609?

2018-05-01 Thread John Roesler
Hi Artur,

Thanks for the clarification.

I don't think that ".toStream()" actually does anything besides change the
context from KTable to KStream in the DSL. The Javadoc says:

* Note that this is a logical operation and only changes the
> "interpretation" of the stream, i.e., each record of
> * this changelog stream is no longer treated as an updated record (cf.
> {@link KStream} vs {@code KTable}).


Not to belabor the point, but I wouldn't want you to focus too much on
getting rid of the "toStream" and in favor of the same methods on KTable,
as I think that would have the exact same semantics.

It's entirely possible that some additional tuning on the join could reduce
the deplicates you're seeing. For example, what are your current settings
for commit interval and dedup cache size?

In any case, though, Kafka Streams's deduplication mechanism is only
best-effort. So if your correctness depends on unique events (as yours
does), I still think you're better off coding in anticipation of
duplicates. For example, you could implement hashCode and equals on
ClaimAndPayment and store them in a LinkedHashSet (to preserve both
uniqueness and order).

Hope that helps,
-John


On Tue, May 1, 2018 at 12:40 PM, Artur Mrozowski  wrote:

> Hi John,
> yes, the answer is very helpful and your understanding of the data flow is
> correct. Although, deduplication is not the issue because there will not be
> any duplicates inserted into the flow.
> These, the duplicates will be generated, from unique records after the join
> between claim and payments and converting the result to stream.
> But perhaps that stream is entirely avoidable?
>
> So it would look something like this:
>
> KTable left
> {"claimcounter": 0, "claimreporttime": 55948.33110985625, "claimnumber":
> "3_0", "claimtime": 708.521153490306}
>
> and KTable  right
>
> {"claimcounter": 0, "paytime": 55960.226718985265, "claimnumber": "3_0",
> "payment": 847015.1437781961}
>
> When I leftjoin theses two objects the result in the state store will be an
> object containing two  ArrayLists left and right, like this
>
> {"claimList":{"lst":[{"claimnumber":"3_0","claimtime":"708.521153490306","
> claimreporttime":"55948.33110985625","claimcounter":"
> 0"}]},"paymentList":{"lst":[{"payment":847015.1437781961,"
> paytime":55960.226718985265,"claimcounter":0,"claimnumber":"3_0"}]}}
>
> But I want to continue processing the results by using groupBy and
> aggregate so I convert reuslt of the leftjoin to stream. Now the resulting
> repartion and changelog topics will contain two identical messages, like
> this
>
> {"claimList":{"lst":[{"claimnumber":"3_0","claimtime":"708.521153490306","
> claimreporttime":"55948.33110985625","claimcounter":"
> 0"}]},"paymentList":{"lst":[{"payment":847015.1437781961,"
> paytime":55960.226718985265,"claimcounter":0,"claimnumber":"3_0"}]}}
> {"claimList":{"lst":[{"claimnumber":"3_0","claimtime":"708.521153490306","
> claimreporttime":"55948.33110985625","claimcounter":"
> 0"}]},"paymentList":{"lst":[{"payment":847015.1437781961,"
> paytime":55960.226718985265,"claimcounter":0,"claimnumber":"3_0"}]}}
>
> Best regards
> Artur
>
>
> On Mon, Apr 30, 2018 at 5:30 PM, John Roesler  wrote:
>
> > Hello Artur,
> >
> > Apologies in advance if I say something incorrect, as I'm still a little
> > new to this project.
> >
> > If I followed your example, then I think the scenario is that you're
> > joining "claims" and "payments", grouping by "claimNumber", and then
> > building a list for each "claimNumber" of all the claim/payment pairs. Is
> > that right?
> >
> > It's not in your example, but the grouped stream or table for "claims"
> > (claimStrGrouped) and "payments" (paymentGrouped) must be keyed with the
> > same key, right? In that case, the result of their join will also be
> keyed
> > by that same key.
> >
> > It seems like the problem you're seeing is that that list contains the
> same
> > claim/payment pair multiple times for a given claimNumber. Did I

Re: Kafka Streams Produced Wrong (duplicated) Results with Simple Windowed Aggregation Case

2018-06-04 Thread John Roesler
Hi EC,

Thanks for the very clear report and question. Like Guozhang said this is
expected (but not ideal) behavior.

For an immediate work-around, you can try materializing the KTable and
setting the commit interval and cache size as discussed here (
https://www.confluent.io/blog/watermarks-tables-event-time-dataflow-model/)
to reduce (but not eliminate) duplicates.

I'm in the process of getting my thoughts in order to write a KIP to
address this exact use case. If you're interested in participating in the
discussion, you can keep an eye on the dev mailing list or watch the KIP
page. I can't say when exactly I'll start it. I want to get it out there
soon, but I also want to do my homework and have a good proposal.

Thanks,
-John

On Mon, Jun 4, 2018 at 12:45 PM, Guozhang Wang  wrote:

> Hello,
>
> Your observation is correct, Kafka Streams by default will print continuous
> updates to each window, instead of waiting for the "final" update for each
> window.
>
> There are some ongoing work to provide the functionality to allow users
> specify sth. like "give me the final result for windowed aggregations" in
> the DSL, it will probably come post 2.0 release.
>
> Guozhang
>
>
> On Mon, Jun 4, 2018 at 8:14 AM, EC Boost  wrote:
>
> > Logged the internal windows information:
> >
> > Window{start=152804303, end=152804304} key=t6  1
> > Window{start=152804304, end=152804305} key=t1  2
> > Window{start=152804304, end=152804305} key=t7  3
> > Window{start=152804304, end=152804305} key=t5  4
> > Window{start=152804304, end=152804305} key=t5  4,5
> > Window{start=152804305, end=152804306} key=t6  6
> > Window{start=152804305, end=152804306} key=t6  6,7
> > Window{start=152804305, end=152804306} key=t4  8
> > Window{start=152804306, end=152804307} key=t6  9
> > Window{start=152804306, end=152804307} key=t7  10
> > Window{start=152804306, end=152804307} key=t6  9,11
> > Window{start=152804307, end=152804308} key=t5  12
> > Window{start=152804307, end=152804308} key=t6  13
> > Window{start=152804307, end=152804308} key=t4  14
> > Window{start=152804307, end=152804308} key=t4  14,15
> >
> > 
> >
> > It seems that Kafka Stream send all the  KTable changelog as output and
> > that's probably why there's duplicate outputs for gap-less
> non-overlapping
> > window.
> >
> > Is there any way to achieve real mini-batch-like style processing
> semantics
> > using non-overlapping windows which means only the last  value will be
> sent
> > as output not all the changelogs in the windows?
> >
> >
> > On Mon, Jun 4, 2018 at 1:25 AM, EC Boost  wrote:
> >
> > > Hello Everyone,
> > >
> > > I got duplicated results using kstreams for simple  windowed
> aggregation.
> > >
> > > The input event format is comma seperated:  "event_id,event_type" and I
> > > need to aggregate them by event type.
> > >
> > > Following is the Kafka Stream processing logic:
> > >
> > > events
> > >   .map((k, v) -> KeyValue.pair(v.split(",")[1], v.split(",")[0]))
> > >   .groupByKey()
> > >   .windowedBy(TimeWindows.of(TimeUnit.SECONDS.toMillis(10)))
> > >   .aggregate(
> > > ArrayList::new,
> > > (type, id, eventList) -> {
> > >   eventList.add(id);
> > >   return eventList;
> > > },
> > > Materialized.with(stringSerde, arraySerde)
> > >   )
> > >   .toStream((k,v) -> k.key())
> > >   .mapValues((v)-> String.join(",", v))
> > >   .to("ks-debug-output", Produced.with(stringSerde, stringSerde));
> > >
> > >
> > > I produced the input messages using the following snippet:
> > >
> > > require "kafka"
> > >
> > > kafka = Kafka.new(["localhost:9092"], client_id: "event-producer")
> > >
> > > f = File.open("events.txt")
> > > f.each_line { |l|
> > >   puts l
> > >   kafka.deliver_message("#{l.strip}", topic: "ks-debug-input")
> > >   sleep(3)
> > > }
> > >
> > >
> > >
> > > Messages in events.txt is the following ( format :
> "event_id,event_type"
> > > and event_id is unique )  :
> > >
> > > Input
> > >
> > > 1,t6
> > > 2,t1
> > > 3,t7
> > > 4,t5
> > > 5,t5
> > > 6,t6
> > > 7,t6
> > > 8,t4
> > > 9,t6
> > > 10,t7
> > > 11,t6
> > > 12,t5
> > > 13,t6
> > > 14,t4
> > > 15,t4
> > > 16,t2
> > > 17,t7
> > > 18,t6
> > > 19,t3
> > > 20,t7
> > > 21,t1
> > > 22,t5
> > > 23,t5
> > > 24,t6
> > > 25,t6
> > > 26,t4
> > > 27,t4
> > > 28,t3
> > > 29,t2
> > > 30,t5
> > > 31,t1
> > > 32,t1
> > > 33,t1
> > > 34,t1
> > > 35,t2
> > > 36,t4
> > > 37,t3
> > > 38,t3
> > > 39,t6
> > > 40,t6
> > > 41,t1
> > > 42,t4
> > > 43,t4
> > > 44,t6
> > > 45,t6
> > > 46,t7
> > > 47,t7
> > > 48,t3
> > > 49,t1
> > > 50,t6
> > > 51,t1
> > > 52,t4
> > > 53,t6
> > > 54,t7
> > > 55,t1
> > > 56,t1
> > > 57,t1
> > > 58,t5
> > > 59,t6
> > > 60,t7
> > > 61,t6
> > > 62,t4
> > > 63,t5
> > > 64,t1
> > > 65,t3
> > > 66,t1
> > > 67,t3
> > > 68,t3
> > > 69,t5
> > > 70,t1
> > > 71,t6
> > > 72,t5
> > > 73,t6
> > > 74,t1

Re: Some Total and Rate metrics are not consistent

2018-06-21 Thread John Roesler
Hi Sam,

This sounds like a condition I fixed in
https://github.com/apache/kafka/commit/ed51b2cdf5bdac210a6904bead1a2ca6e8411406#diff-8b364ed2d0abd8e8ae21f5d322db6564R221
. I realized that the prior code creates a new Meter, which uses a Total
metric instead of a Count. But that would total all the values of the
metric, when instead what we want is to "total" the number of measurements
(aka count them).

I just peeked at the 1.1 branch, and it seems this change made it in after
the 1.1 branch cut, so it would only be fixed in 2.0.

Thanks,
-John

On Wed, Jun 20, 2018 at 8:44 PM Guozhang Wang  wrote:

> Thanks for reporting this Sam, could you check and confirm if this issue is
> fixed in trunk? If not, we should file a JIRA.
>
>
> Guozhang
>
> On Wed, Jun 20, 2018 at 6:41 PM, Sam Lendle  wrote:
>
> > It looks like there is indeed a bug in kafka-streams 1.1.0. I think what
> > was happening was the time spent processing each record in ns was being
> > added to the total metric instead of incrementing by 1 for each record.
> > Looks like the implementation has been changed in trunk. I don't see any
> > commit messages mentioning this particular issue, but hopefully the
> change
> > fixes it.
> > --
> > *From:* Sam Lendle
> > *Sent:* Wednesday, June 20, 2018 6:10:03 PM
> > *To:* users@kafka.apache.org
> > *Subject:* Some Total and Rate metrics are not consistent
> >
> >
> > I’m trying to use the total metrics introduced in KIP-187 (
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 187+-+Add+cumulative+count+metric+for+all+Kafka+rate+metrics)
> >
> >
> >
> > For some metrics, the total and rates are not consistent. In particular,
> > for stream-processor-node-metrics, I’m seeing about 500-800 operations
> per
> > second in a particular streams thread/processor node as reported by the
> > process-rate metric, but the process-total metric is increasing by about
> > 100 million per second. See attached screenshot from VisualVM.
> >
> >
> >
> > Other metrics seem fine, for example forward-rate and forward-total
> > metrics under stream-processor-node-metrics are consistent.
> >
> >
> >
> > Am I misunderstanding the interpretation of the –total metrics? If this
> is
> > a bug, can I do anything in addition to this email to report it? File a
> > JIRA?
> >
> >
> > Best,
> > Sam
> >
> >
> >
> >
> >
> >
> >
> >
> >
>
>
> --
> -- Guozhang
>


Re: Kafka Streams - Expiring Records By Process Time

2018-06-21 Thread John Roesler
Hi Sicheng,

I'm also curious about the details.

Let's say you are doing a simple count aggregation with 24-hour windows.
You got three events with key "A" on 2017-06-21, one year ago, so the
windowed key (A,2017-06-21) has a value of 3.

Fast-forward a year later. We get one late event, also for 2017-06-21. The
correct value of (A,2017-06-21) is now 4.

If you set your retention time to account for such lateness, say 2 years,
there's no problem. Streams still has the state "(A,2017-06-21): 3" in
storage, so it can update the value and emit "(A,2017-06-21): 4".

But if you have your retention shorter, let's say 1 month, then Streams
deleted that state long ago and no longer knows about those three prior
events. If it processes that event, it will incorrectly report
"(A,2017-06-21):1"
as the value for that windowed key.

So, to preserve correctness, you must either discard new events for expired
windows or set the retention higher than any lateness you'll observe.

On the other hand, you can use processing time, instead of event time, in
which case our new event actually belongs to today's window, 2018-06-21,
which is still retained.

But this whole thing only works out if you use the same notion of time,
event or processing, for *both* window assignment and expiration,
otherwise, you get incorrect results. Specifically, you can "trick" Streams
into processing that event for the expired window and get the incorrect
result of "(A,2017-06-21): 1".

Is that an accurate depiction of the situation, or have I missed something?

Thanks,
-John

On Thu, Jun 21, 2018 at 5:52 PM Matthias J. Sax 
wrote:

> Can't you increase retention time accordingly to make sure that "old"
> metrics are not dropped?
>
> -Matthias
>
> On 6/21/18 2:07 PM, Sicheng Liu wrote:
> > Because we might get very "old" metrics (the timestamp on the metric is
> > very old, even though the metric is just delivered, for example,
> > backfill.). If you use event-time for retention, these old metrics could
> be
> > dropped and won't be aggregated. If we use process-time, at least it will
> > stay in state-store for some time for aggregation.
> >
> > On Thu, Jun 21, 2018 at 1:24 PM, Matthias J. Sax 
> > wrote:
> >
> >> I don't understand why event-time retention time cannot be used? Cannot
> >> elaborate?
> >>
> >> -Matthias
> >>
> >> On 6/21/18 10:59 AM, Sicheng Liu wrote:
> >>> Hi All,
> >>>
> >>> We have a use case that we aggregate some metrics with its event-time
> >>> (timestamp on the metric itself) using the simplest tumbling window.
> The
> >>> window itself can be set a retention but since we are aggregating with
> >>> event-time the retention has to be based on event-time too. However, in
> >> our
> >>> scenario, we have some late arrival metrics (up to one year) and we
> hope
> >>> the window retention can be based on process-time so that we can hold
> the
> >>> late arrival metrics for some time and expire them after some hours
> even
> >>> without new metrics of the same aggregation key coming.
> >>>
> >>> We have tried:
> >>> 1. Set TTL on RocksDB but it is disabled in Kafka Streams.
> >>> 2. Using low level processor API but scanning the statestore and delete
> >> one
> >>> by one significantly drops the performance.
> >>>
> >>> Please let us know if it is possible to aggregate by event-time but
> >> setting
> >>> the window retention based on its process-time.
> >>>
> >>> Thanks,
> >>> Sicheng
> >>>
> >>
> >>
> >
>
>


[DISCUSS] KIP-328: Ability to suppress updates for KTables

2018-06-26 Thread John Roesler
Hello devs and users,

Please take some time to consider this proposal for Kafka Streams:

KIP-328: Ability to suppress updates for KTables

link: https://cwiki.apache.org/confluence/x/sQU0BQ

The basic idea is to provide:
* more usable control over update rate (vs the current state store caches)
* the final-result-for-windowed-computations feature which several people
have requested

I look forward to your feedback!

Thanks,
-John


Re: [DISCUSS] KIP-328: Ability to suppress updates for KTables

2018-06-27 Thread John Roesler
Thanks for taking look, Ted,

I agree this is a departure from the conventions of Streams DSL.

Most of our config objects have one or two "required" parameters, which fit
naturally with the static factory method approach. TimeWindow, for example,
requires a size parameter, so we can naturally say TimeWindows.of(size).

I think in the case of a suppression, there's really no "core" parameter,
and "Suppression.of()" seems sillier than "new Suppression()". I think that
Suppression.of(duration) would be ambiguous, since there are many durations
that we can configure.

However, thinking about it again, I suppose that I can give each
configuration method a static version, which would let you replace "new
Suppression()." with "Suppression." in all the examples. Basically, instead
of "of()", we'd support any of the methods I listed.

For example:

windowCounts
.suppress(
Suppression
.suppressLateEvents(Duration.ofMinutes(10))
.suppressIntermediateEvents(
IntermediateSuppression.emitAfter(Duration.ofMinutes(10))
)
);


Does that seem better?

Thanks,
-John


On Wed, Jun 27, 2018 at 12:44 AM Ted Yu  wrote:

> I started to read this KIP which contains a lot of materials.
>
> One suggestion:
>
> .suppress(
> new Suppression()
>
>
> Do you think it would be more consistent with the rest of Streams data
> structures by supporting `of` ?
>
> Suppression.of(Duration.ofMinutes(10))
>
>
> Cheers
>
>
>
> On Tue, Jun 26, 2018 at 1:11 PM, John Roesler  wrote:
>
> > Hello devs and users,
> >
> > Please take some time to consider this proposal for Kafka Streams:
> >
> > KIP-328: Ability to suppress updates for KTables
> >
> > link: https://cwiki.apache.org/confluence/x/sQU0BQ
> >
> > The basic idea is to provide:
> > * more usable control over update rate (vs the current state store
> caches)
> > * the final-result-for-windowed-computations feature which several people
> > have requested
> >
> > I look forward to your feedback!
> >
> > Thanks,
> > -John
> >
>


Re: [DISCUSS] KIP-328: Ability to suppress updates for KTables

2018-06-27 Thread John Roesler
Hello again all,

I realized today that I neglected to include metrics in the proposal. I
have added them just now.

Thanks,
-John

On Tue, Jun 26, 2018 at 3:11 PM John Roesler  wrote:

> Hello devs and users,
>
> Please take some time to consider this proposal for Kafka Streams:
>
> KIP-328: Ability to suppress updates for KTables
>
> link: https://cwiki.apache.org/confluence/x/sQU0BQ
>
> The basic idea is to provide:
> * more usable control over update rate (vs the current state store caches)
> * the final-result-for-windowed-computations feature which several people
> have requested
>
> I look forward to your feedback!
>
> Thanks,
> -John
>


Re: [DISCUSS] KIP-328: Ability to suppress updates for KTables

2018-06-27 Thread John Roesler
Thanks for the feedback, Matthias,

It seems like in straightforward relational processing cases, it would not
make sense to bound the lateness of KTables. In general, it seems better to
have "guard rails" in place that make it easier to write sensible programs
than insensible ones.

But I'm still going to argue in favor of keeping it for all KTables ;)

1. I believe it is simpler to understand the operator if it has one uniform
definition, regardless of context. It's well defined and intuitive what
will happen when you use late-event suppression on a KTable, so I think
nothing surprising or dangerous will happen in that case. From my
perspective, having two sets of allowed operations is actually an increase
in cognitive complexity.

2. To me, it's not crazy to use the operator this way. For example, in lieu
of full-featured timestamp semantics, I can implement MVCC behavior when
building a KTable by "suppressLateEvents(Duration.ZERO)". I suspect that
there are other, non-obvious applications of suppressing late events on
KTables.

3. Not to get too much into implementation details in a KIP discussion, but
if we did want to make late-event suppression available only on windowed
KTables, we have two enforcement options:
  a. check when we build the topology - this would be simple to implement,
but would be a runtime check. Hopefully, people write tests for their
topology before deploying them, so the feedback loop isn't instantaneous,
but it's not too long either.
  b. add a new WindowedKTable type - this would be a compile time check,
but would also be substantial increase of both interface and code
complexity.

We should definitely strive to have guard rails protecting against
surprising or dangerous behavior. Protecting against programs that we don't
currently predict is a lesser benefit, and I think we can put up guard
rails on a case-by-case basis for that. It seems like the increase in
cognitive (and potentially code and interface) complexity makes me think we
should skip this case.

What do you think?

Thanks,
-John

On Wed, Jun 27, 2018 at 11:59 AM Matthias J. Sax 
wrote:

> Thanks for the KIP John.
>
> One initial comments about the last example "Bounded lateness": For a
> non-windowed KTable bounding the lateness does not really make sense,
> does it?
>
> Thus, I am wondering if we should allow `suppressLateEvents()` for this
> case? It seems to be better to only allow it for windowed-KTables.
>
>
> -Matthias
>
>
> On 6/27/18 8:53 AM, Ted Yu wrote:
> > I noticed this (lack of primary parameter) as well.
> >
> > What you gave as new example is semantically the same as what I
> suggested.
> > So it is good by me.
> >
> > Thanks
> >
> > On Wed, Jun 27, 2018 at 7:31 AM, John Roesler  wrote:
> >
> >> Thanks for taking look, Ted,
> >>
> >> I agree this is a departure from the conventions of Streams DSL.
> >>
> >> Most of our config objects have one or two "required" parameters, which
> fit
> >> naturally with the static factory method approach. TimeWindow, for
> example,
> >> requires a size parameter, so we can naturally say TimeWindows.of(size).
> >>
> >> I think in the case of a suppression, there's really no "core"
> parameter,
> >> and "Suppression.of()" seems sillier than "new Suppression()". I think
> that
> >> Suppression.of(duration) would be ambiguous, since there are many
> durations
> >> that we can configure.
> >>
> >> However, thinking about it again, I suppose that I can give each
> >> configuration method a static version, which would let you replace "new
> >> Suppression()." with "Suppression." in all the examples. Basically,
> instead
> >> of "of()", we'd support any of the methods I listed.
> >>
> >> For example:
> >>
> >> windowCounts
> >> .suppress(
> >> Suppression
> >> .suppressLateEvents(Duration.ofMinutes(10))
> >> .suppressIntermediateEvents(
> >>
>  IntermediateSuppression.emitAfter(Duration.ofMinutes(10))
> >> )
> >> );
> >>
> >>
> >> Does that seem better?
> >>
> >> Thanks,
> >> -John
> >>
> >>
> >> On Wed, Jun 27, 2018 at 12:44 AM Ted Yu  wrote:
> >>
> >>> I started to read this KIP which contains a lot of materials.
> >>>
> >>> One suggestion:
> >>>
> >>> .suppress(
> >>> new Suppression()
> >>>
> >>>
> >>> Do y

Re: [DISCUSS] KIP-328: Ability to suppress updates for KTables

2018-06-27 Thread John Roesler
changelog stream? And the semantic difference between "skipped record due
> to window retention" and "late-event-suppression" is quite obscure (btw I
> am not sure it is true that "Skipped records are records that are for one
> reason or another invalid.", since "skipped record due to window retention
> time" is not really due to an invalid record, but some window store
> implementation details, right?)
>
> Thinking about this further, although I understand the intention to propose
> an unified API for all three motivation requests, I feel the "Final value
> of a window" request may better be handled in a more restricted interface.
>
>
> So just throwing out a bold / controversial idea to this proposal: instead
> of using a unified suppress() for all three motivation scenarios, we have:
>
> 1) KTable#suppress() for "Request shaping" and "Easier config", and it will
> only for intermediate-event-suppression, and in this case, for both
> windowed and non-windowed KTable, the suppression semantics can be
> dependent on each key's record timestamp plus the byte buffer size limit /
> buffer strategy.
>
> 2) In TimeWindowedStream / SessionWindowedStream#aggregate() that result in
> a windowed KTable (note that although KGroupedTable#aggregate can also
> result in a windowed KTable, its window semantics is not very well defined
> I'd suggest we defer its discussion later), we add another config object,
> e.g.:
>
> TimeWindowedStream#aggregate(final Initializer initializer,
>final Aggregator super K, ? super V, VR> aggregator,
>final
> Materialized> materialized,
>final Emitted
> emitted);
>
> public class Emitted {
>
> static Emitted onlyOnceAfterWindowClosed(final long
> late-period-allowed);
>
> static Emitted wheneverWindowUpdated();  // this may still be subject
> to caching effects, so not exactly every update..
>
> }
>
> 
>
> The Emitted config option is going to be much less expressive than
> `Suppressed`, intentionally, to only cover the "Final value of a window"
> case. Note that the resulted window KTable can still be suppressed
> programmatically, but if it is already been emitted only once, then the
> suppress function will take no effect.
>
> In this case, the difference of "late-period-allowed" v.s.
> "Windows.until()" is that, the former determines if or not a record will be
> applied to update the window or not, and it is controlled in the
> WindowedStreamAggregateProcessor, and whenever an event gets dropped
> because of it we record it in a new, say "too-late-records" metric (same to
> "late-event-suppression" actually, just using a different name, while the
> latter only controls how long at least each window will be retained for
> queries and should normally be larger than (window size + late
> period-allowed). From implementation's pov, if the retention time of a
> window is less than (window size + late-period-allowed), the Processor may
> not be able to find any matching window when first trying to get it from
> store, and it then need to tell if it is because the key is never been
> updated for this window or because the window retention has elapsed, hence
> it needs to be aware of the window retention time. And in the latter case,
> it will drop it on the floor and also record it in "too-late-records"
> metrics. And also this emit policy would not need any buffering, since the
> original store's cache contains the record context already need for
> flushing downstream.
>
> My primary motivation is that, from user's perspective, this may be easier
> to comprehensive and reason from the metrics. But if people think it
> actually does not make things better, I'm happy to rethink the current
> proposal.
>
>
>
>
> Guozhang
>
>
> On Wed, Jun 27, 2018 at 12:04 PM, John Roesler  wrote:
>
> > Thanks for the feedback, Matthias,
> >
> > It seems like in straightforward relational processing cases, it would
> not
> > make sense to bound the lateness of KTables. In general, it seems better
> to
> > have "guard rails" in place that make it easier to write sensible
> programs
> > than insensible ones.
> >
> > But I'm still going to argue in favor of keeping it for all KTables ;)
> >
> > 1. I believe it is simpler to understand the operator if it has one
> uniform
> > definition, regardless of context. It's well defined an

Re: [DISCUSS] KIP-328: Ability to suppress updates for KTables

2018-07-02 Thread John Roesler
gt; 1. Does "suppressLateEvents" with parameter Y != X (window retention time)
> for windowed stores make sense in practice?
> 2. Does "suppressLateEvents" with any parameter Y for non-windowed stores
> make sense in practice?
>
>
>
> Guozhang
>
>
> On Fri, Jun 29, 2018 at 2:26 PM, Bill Bejeck  wrote:
>
> > Thanks for the explanation, that does make sense.  I have some questions
> on
> > operations, but I'll just wait for the PR and tests.
> >
> > Thanks,
> > Bill
> >
> > On Wed, Jun 27, 2018 at 8:14 PM John Roesler  wrote:
> >
> > > Hi Bill,
> > >
> > > Thanks for the review!
> > >
> > > Your question is very much applicable to the KIP and not at all an
> > > implementation detail. Thanks for bringing it up.
> > >
> > > I'm proposing not to change the existing caches and configurations at
> all
> > > (for now).
> > >
> > > Imagine you have a topology like this:
> > > commit.interval.ms = 100
> > >
> > > (ktable1 (cached)) -> (suppress emitAfter 200)
> > >
> > > The first ktable (ktable1) will respect the commit interval and buffer
> > > events for 100ms before logging, storing, or forwarding them (IIRC).
> > > Therefore, the second ktable (suppress) will only see the events at a
> > rate
> > > of once per 100ms. It will apply its own buffering, and emit once per
> > 200ms
> > > This case is pretty trivial because the suppress time is a multiple of
> > the
> > > commit interval.
> > >
> > > When it's not an integer multiple, you'll get behavior like in this
> > marble
> > > diagram:
> > >
> > >
> > > <-(k:1)--(k:2)--(k:3)--(k:4)--(k:5)--(k:6)->
> > >
> > > [ KTable caching with commit interval = 2 ]
> > >
> > > <(k:2)-(k:4)-(k:6)->
> > >
> > >   [ suppress with emitAfter = 3 ]
> > >
> > > <---(k:2)(k:6)->
> > >
> > >
> > > If this behavior isn't desired (for example, if you wanted to emit
> (k:3)
> > at
> > > time 3, I'd recommend setting the "cache.max.bytes.buffering" to 0 or
> > > modifying the topology to disable caching. Then, the behavior is more
> > > simply determined just by the suppress operator.
> > >
> > > Does that seem right to you?
> > >
> > >
> > > Regarding the changelogs, because the suppression operator hangs onto
> > > events for a while, it will need its own changelog. The changelog
> > > should represent the current state of the buffer at all times. So when
> > the
> > > suppress operator sees (k:2), for example, it will log (k:2). When it
> > > later gets to time 3, it's time to emit (k:2) downstream. Because k is
> no
> > > longer buffered, the suppress operator will log (k:null). Thus, when
> > > recovering,
> > > it can rebuild the buffer by reading its changelog.
> > >
> > > What do you think about this?
> > >
> > > Thanks,
> > > -John
> > >
> > >
> > >
> > > On Wed, Jun 27, 2018 at 4:16 PM Bill Bejeck  wrote:
> > >
> > > > Hi John,  thanks for the KIP.
> > > >
> > > > Early on in the KIP, you mention the current approaches for
> controlling
> > > the
> > > > rate of downstream records from a KTable, cache size configuration
> and
> > > > commit time.
> > > >
> > > > Will these configuration parameters still be in effect for tables
> that
> > > > don't use suppression?  For tables taking advantage of suppression,
> > will
> > > > these configurations have no impact?
> > > > This last question may be to implementation specific but if the
> > requested
> > > > suppression time is longer than the specified commit time, will the
> > > latest
> > > > record in the suppression buffer get stored in a changelog?
> > > >
> > > > Thanks,
> > > > Bill
> > > >
> > > > On Wed, Jun 27, 2018 at 3:04 PM John Roesler 
> > wrote:
> > > >
> > > > > Thanks for the feedback, Matthias,
> > > > >
> > > > > It seems like in straightforward relational processing cases, it
> > would
> > > > not
> > > > > make sense to bound the lateness of KTables. I

Re: [DISCUSS] KIP-328: Ability to suppress updates for KTables

2018-07-02 Thread John Roesler
o Materialized builders have to
provide seemingly obvious type bounds.


= Conclusion =


I think option 2 is more "normal" and discoverable. It does have a
downside, but it's one that's pre-existing elsewhere in the DSL.

WDYT? Would the addition of this "recipe" method to Suppression resolve
your concern?

Thanks again,
-John

On Sun, Jul 1, 2018 at 11:24 PM Guozhang Wang  wrote:

> Hi John,
>
> Regarding the metrics: yeah I think I'm with you that the dropped records
> due to window retention or emit suppression policies should be recorded
> differently, and using this KIP's proposed metric would be fine. If you
> also think we can use this KIP's proposed metrics to cover the window
> retention cased skipping records, then we can include the changes in this
> KIP as well.
>
> Regarding the current proposal, I'm actually not too worried about the
> inconsistency between query semantics and downstream emit semantics. For
> queries, we will always return the current running results of the windows,
> being it partial or final results depending on the window retention time
> anyways, which has nothing to do whether the emitted stream should be one
> final output per key or not. I also agree that having a unified operation
> is generally better for users to focus on leveraging that one only than
> learning about two set of operations. The only question I had is, for final
> updates of window stores, if it is a bit awkward to understand the
> configuration combo. Thinking about this more, I think my root worry in the
> "suppressLateEvents" call for windowed tables, since from a user
> perspective: if my retention time is X which means "pay the cost to allow
> late records up to X to still be applied updating the tables", why would I
> ever want to suppressLateEvents by Y ( < X), to say "do not send the
> updates up to Y, which means the downstream operator or sink topic for this
> stream would actually see a truncated update stream while I've paid larger
> cost for that"; and of course, Y > X would not make sense either as you
> would not see any updates later than X anyways. So in all, my feeling is
> that it makes less sense for windowed table's "suppressLateEvents" with a
> parameter that is not equal to the window retention, and opening the door
> in the current proposal may confuse people with that.
>
> Again, above is just a subjective opinion and probably we can also bring up
> some scenarios that users does want to set X != Y.. but personally I feel
> that even if the semantics for this scenario if intuitive for user to
> understand, doe that really make sense and should we really open the door
> for it. So I think maybe separating the final update in a separate API's
> benefits may overwhelm the advantage of having one uniform definition. And
> for my alternative proposal, the rationale was from both my concern about
> "suppressLateEvents" for windowed store, and Matthias' question about
> "suppressLateEvents" for non-windowed stores, that if it is less meaningful
> for both, we can consider removing it completely and only do
> "IntermediateSuppression" in Suppress instead.
>
> So I'd summarize my thoughts in the following questions:
>
> 1. Does "suppressLateEvents" with parameter Y != X (window retention time)
> for windowed stores make sense in practice?
> 2. Does "suppressLateEvents" with any parameter Y for non-windowed stores
> make sense in practice?
>
>
>
> Guozhang
>
>
> On Fri, Jun 29, 2018 at 2:26 PM, Bill Bejeck  wrote:
>
> > Thanks for the explanation, that does make sense.  I have some questions
> on
> > operations, but I'll just wait for the PR and tests.
> >
> > Thanks,
> > Bill
> >
> > On Wed, Jun 27, 2018 at 8:14 PM John Roesler  wrote:
> >
> > > Hi Bill,
> > >
> > > Thanks for the review!
> > >
> > > Your question is very much applicable to the KIP and not at all an
> > > implementation detail. Thanks for bringing it up.
> > >
> > > I'm proposing not to change the existing caches and configurations at
> all
> > > (for now).
> > >
> > > Imagine you have a topology like this:
> > > commit.interval.ms = 100
> > >
> > > (ktable1 (cached)) -> (suppress emitAfter 200)
> > >
> > > The first ktable (ktable1) will respect the commit interval and buffer
> > > events for 100ms before logging, storing, or forwarding them (IIRC).
> > > Therefore, the second ktable (suppress) will only see the events at a
> &g

Re: [DISCUSS] KIP-328: Ability to suppress updates for KTables

2018-07-02 Thread John Roesler
In fact, to push the idea further (which IIRC is what Matthias originally
proposed), if we can accept "Suppression#finalResultsOnly" in my last
email, then we could also consider whether to eliminate
"suppressLateEvents" entirely.

We could always add it later, but you've both expressed doubt that there
are practical use cases for it outside of final-results.

-John

On Mon, Jul 2, 2018 at 12:27 PM John Roesler  wrote:

> Hi again, Guozhang ;) Here's the second part of my response...
>
> It seems like your main concern is: "if I'm a user who wants final update
> semantics, how complicated is it for me to get it?"
>
> I think we have to assume that people don't always have time to become
> deeply familiar with all the nuances of a programming environment before
> they use it. Especially if they're evaluating several frameworks for their
> use case, it's very valuable to make it as obvious as possible how to
> accomplish various computations with Streams.
>
> To me the biggest question is whether with a fresh perspective, people
> would say "oh, I get it, I have to bound my lateness and suppress
> intermediate updates, and of course I'll get only the final result!", or if
> it's more like "wtf? all I want is the final result, what are all these
> parameters?".
>
> I was talking with Matthias a while back, and he had an idea that I think
> can help, which is to essentially set up a final-result recipe in addition
> to the raw parameters. I previously thought that it wouldn't be possible to
> restrict its usage to Windowed KTables, but thinking about it again this
> weekend, I have a couple of ideas:
>
> 
> = 1. Static Wrapper =
> 
> We can define an extra static function that "wraps" a KTable with
> final-result semantics.
>
> public static  KTable finalResultsOnly(
>   final KTable windowedKTable,
>   final Duration maxAllowedLateness,
>   final Suppression.BufferFullStrategy bufferFullStrategy) {
> return windowedKTable.suppress(
> Suppression.suppressLateEvents(maxAllowedLateness)
>.suppressIntermediateEvents(
>  IntermediateSuppression
>.emitAfter(maxAllowedLateness)
>.bufferFullStrategy(bufferFullStrategy)
>)
> );
> }
>
> Because windowedKTable is a parameter, the static function can easily
> impose an extra bound on the key type, that it extends Windowed. This would
> make "final results only" only available on windowed ktables.
>
> Here's how it would look to use:
>
> final KTable, Long> windowCounts = ...
> final KTable, Long> finalCounts =
>   finalResultsOnly(
> windowCounts,
> Duration.ofMinutes(10),
> Suppression.BufferFullStrategy.SHUT_DOWN
>   );
>
> Trying to use it on a non-windowed KTable yields:
>
>> Error:(129, 35) java: method finalResultsOnly in class
>> org.apache.kafka.streams.kstream.internals.KTableAggregateTest cannot be
>> applied to given types;
>>   required:
>> org.apache.kafka.streams.kstream.KTable,java.time.Duration,org.apache.kafka.streams.kstream.Suppression.BufferFullStrategy
>>   found:
>> org.apache.kafka.streams.kstream.KTable,java.time.Duration,org.apache.kafka.streams.kstream.Suppression.BufferFullStrategy
>>   reason: inference variable K has incompatible bounds
>> equality constraints: java.lang.String
>> upper bounds: org.apache.kafka.streams.kstream.Windowed
>
>
>
> =
> = 2. Add  parameters and recipe method to Suppression =
> =
>
> By adding K,V parameters to Suppression, we can provide a similarly
> bounded config method directly on the Suppression class:
>
> public static  Suppression
> finalResultsOnly(final Duration maxAllowedLateness, final
> BufferFullStrategy bufferFullStrategy) {
> return Suppression
> .suppressLateEvents(maxAllowedLateness)
> .suppressIntermediateEvents(IntermediateSuppression
> .emitAfter(maxAllowedLateness)
> .bufferFullStrategy(bufferFullStrategy)
> );
> }
>
> Then, here's how it would look to use it:
>
> final KTable, Long> windowCounts = ...
> final KTable, Long> finalCounts =
>   windowCounts.suppress(
> Suppression.finalResultsOnly(
>   Duration.ofMinutes(10)
>   Suppression.BufferFullStrategy.SHUT_DOWN
> )
>   );
>
> Trying to use it on a non-windowed ktable yields:
>
>> Error:(127, 35) java: method finalResultsOnly in cl

Re: Kafka-streams calling subtractor with null aggregator value in KGroupedTable.reduce() and other weirdness

2018-07-12 Thread John Roesler
Hi Vasily,

Thanks for the email.

To answer your question: you should reset the application basically any
time you change the topology. Some transitions are safe, but others will
result in data loss or corruption. Rather than try to reason about which is
which, it's much safer just to either reset the app or not change it (if it
has important state).

Beyond changes that you make to the topology, we spend a lot of effort to
try and make sure that different versions of Streams will produce the same
topology, so unless the release notes say otherwise, you should be able to
upgrade without a reset.


I can't say right now whether those wacky behaviors are bugs or the result
of changing the topology without a reset. Or if they are correct but
surprising behavior somehow. I'll look into it tomorrow. Do feel free to
open a Jira ticket if you think you have found a bug, especially if you can
describe a repro. Knowing your topology before and after the change would
also be immensely helpful. You can print it with Topology.describe().

Regardless, I'll make a note to take a look at the code tomorrow and try to
decide if you should expect these behaviors with "clean" topology changes.

Thanks,
-John

On Thu, Jul 12, 2018 at 11:51 AM Vasily Sulatskov 
wrote:

> Hi,
>
> I am doing some experiments with kafka-streams KGroupedTable
> aggregation, and admittedly I am not wiping data properly on each
> restart, partially because I also wonder what would happen if you
> change a streams topology without doing a proper reset.
>
> I've noticed that from time to time, kafka-streams
> KGroupedTable.reduce() can call subtractor function with null
> aggregator value, and if you try to work around that, by interpreting
> null aggregator value as zero for numeric value you get incorrect
> aggregation result.
>
> I do understand that the proper way of handling this is to do a reset
> on topology changes, but I'd like to understand if there's any
> legitmate case when kafka-streams can call an adder or a substractor
> with null aggregator value, and should I plan for this, or should I
> interpret this as an invalid state, and terminate the application, and
> do a proper reset?
>
> Also, I can't seem to find a guide which explains when application
> reset is necessary. Intuitively it seems that it should be done every
> time a topology changes. Any other cases?
>
> I tried to debug where the null value comes from and it seems that
> KTableReduce.process() is getting called with Change value with
> newValue == null, and some non-null oldValue. Which leads to and to
> subtractor being called with null aggregator value. I wonder how it is
> possible to have an old value for a key without a new value (does it
> happen because of the auto commit interval?).
>
> I've also noticed that it's possible for an input value from a topic
> to bypass aggregation function entirely and be directly transmitted to
> the output in certain cases: oldAgg is null, newValue is not null and
> oldValue is null - in that case newValue will be transmitted directly
> to the output. I suppose it's the correct behaviour, but feels a bit
> weird nonetheless. And I've actually been able to observe this
> behaviour in practice. I suppose it's also caused by this happening
> right before a commit happens, and the message is sent to a changelog
> topic.
>
> Please can someone with more knowledge shed some light on these issues?
>
> --
> Best regards,
> Vasily Sulatskov
>


Re: Kafka-streams calling subtractor with null aggregator value in KGroupedTable.reduce() and other weirdness

2018-07-13 Thread John Roesler
l agg = Option(aggValue)
> TimedValueWithKey(
>   value = agg.map(_.value).getOrElse(0) - newValue.value,
>   timestamp =
>
> Utils.latestTimestamp(agg.map(_.timestamp).getOrElse(zeroTimestamp),
> newValue.timestamp),
>   key = "reduced"
> )
>   }, "aggregated-table")
>   .toStream
>   .to("slope-aggregated-table")
>
> I log all calls to adder and subtractor, so I am able to see what's
> going on there, as well as I track the original keys of the aggregated
> values and their timestamps, so it's relatively easy to see how the
> data goes through this topology
>
> In order to reproduce this behavior I need to:
> 1. Start a full topology (with table aggregation)
> 2. Start without table aggregation (no app reset)
> 3. Start with table aggregation (no app reset)
>
> Bellow is an interpretation of the adder/subtractor logs for a given
> key/window in the chronological order
>
> SUB: newValue=(key2, 732, 10:50:40) aggValue=null
> ADD: newValue=(key2, 751, 10:50:59) aggValue=(-732, 10:50:40)
> SUB: newValue=(key1, 732, 10:50:40) aggValue=(19, 10:50:59)
> ADD: newValue=(key1, 751, 10:50:59) aggValue=(-713, 10:50:59)
> SUB: newValue=(key3, 732, 10:50:40) aggValue=(38, 10:50:59)
> ADD: newValue=(key3, 751, 10:50:59) aggValue=(-694, 10:50:59)
>
> And in the end the last value that's materialized for that window
> (i.e. windowed key) in the kafka topic is 57, i.e. a increase in value
> for a single key between some point in the middle of the window and at
> the end of the window, times 3. As opposed to the expected value of
> 751 * 3 = 2253 (sum of last values in a time window for all keys being
> aggregated).
>
> It's clear to me that I should do an application reset, but I also
> would like to understand, should I expect adder/subtractor being
> called with null aggValue, or is it a clear sign that something went
> horribly wrong?
>
> On Fri, Jul 13, 2018 at 12:19 AM John Roesler  wrote:
> >
> > Hi Vasily,
> >
> > Thanks for the email.
> >
> > To answer your question: you should reset the application basically any
> > time you change the topology. Some transitions are safe, but others will
> > result in data loss or corruption. Rather than try to reason about which
> is
> > which, it's much safer just to either reset the app or not change it (if
> it
> > has important state).
> >
> > Beyond changes that you make to the topology, we spend a lot of effort to
> > try and make sure that different versions of Streams will produce the
> same
> > topology, so unless the release notes say otherwise, you should be able
> to
> > upgrade without a reset.
> >
> >
> > I can't say right now whether those wacky behaviors are bugs or the
> result
> > of changing the topology without a reset. Or if they are correct but
> > surprising behavior somehow. I'll look into it tomorrow. Do feel free to
> > open a Jira ticket if you think you have found a bug, especially if you
> can
> > describe a repro. Knowing your topology before and after the change would
> > also be immensely helpful. You can print it with Topology.describe().
> >
> > Regardless, I'll make a note to take a look at the code tomorrow and try
> to
> > decide if you should expect these behaviors with "clean" topology
> changes.
> >
> > Thanks,
> > -John
> >
> > On Thu, Jul 12, 2018 at 11:51 AM Vasily Sulatskov 
> > wrote:
> >
> > > Hi,
> > >
> > > I am doing some experiments with kafka-streams KGroupedTable
> > > aggregation, and admittedly I am not wiping data properly on each
> > > restart, partially because I also wonder what would happen if you
> > > change a streams topology without doing a proper reset.
> > >
> > > I've noticed that from time to time, kafka-streams
> > > KGroupedTable.reduce() can call subtractor function with null
> > > aggregator value, and if you try to work around that, by interpreting
> > > null aggregator value as zero for numeric value you get incorrect
> > > aggregation result.
> > >
> > > I do understand that the proper way of handling this is to do a reset
> > > on topology changes, but I'd like to understand if there's any
> > > legitmate case when kafka-streams can call an adder or a substractor
> > > with null aggregator value, and should I plan for this, or should I
> > > interpret this as an invalid state, and terminate the application, and
> > > do a proper reset?
> > >
> > > Also, I can't seem to find a guide which explains when application
> > > reset is necessary. Intuitively it seems that it should be done every
> > > time a topology changes. Any other cases?
> > >
> > > I tried to debug where the null value comes from and it seems that
> > > KTableReduce.process() is getting called with Change value with
> > > newValue == null, and some non-null oldValue. Which leads to and to
> > > subtractor being called with null aggregator value. I wonder how it is
> > > possible to have an old value for a key without a new value (does it
> > > happen because of the auto commit interval?).
> > >
> > > I've also noticed that it's possible for an input value from a topic
> > > to bypass aggregation function entirely and be directly transmitted to
> > > the output in certain cases: oldAgg is null, newValue is not null and
> > > oldValue is null - in that case newValue will be transmitted directly
> > > to the output. I suppose it's the correct behaviour, but feels a bit
> > > weird nonetheless. And I've actually been able to observe this
> > > behaviour in practice. I suppose it's also caused by this happening
> > > right before a commit happens, and the message is sent to a changelog
> > > topic.
> > >
> > > Please can someone with more knowledge shed some light on these issues?
> > >
> > > --
> > > Best regards,
> > > Vasily Sulatskov
> > >
>
>
>
> --
> Best regards,
> Vasily Sulatskov
>


Re: Kafka-streams calling subtractor with null aggregator value in KGroupedTable.reduce() and other weirdness

2018-07-13 Thread John Roesler
Hi Vasily,

I'm glad you're making me look at this; it's good homework for me!

This is very non-obvious, but here's what happens:

KStreamsReduce is a Processor of (K, V) => (K, Change) . I.e., it emits
new/old Change pairs as the value.

Next is the Select (aka GroupBy). In the DSL code, this is the
KTableRepartitionMap (we call it a repartition when you select a new key,
since the new keys may belong to different partitions).
KTableRepartitionMap is a processor that does two things:
1. it maps K => K1 (new keys) and V => V1 (new values)
2. it "explodes" Change(new, old) into [ Change(null, old), Change(new,
null)]
In other words, it turns each Change event into two events: a retraction
and an update

Next comes the reduce operation. In building the processor node for this
operation, we create the sink, repartition topic, and source, followed by
the actual Reduce node. So if you want to look at how the changes get
serialized and desesrialized, it's in KGroupedTableImpl#buildAggregate.
You'll see that sink and source a ChangedSerializer and ChangedDeserializer.

By looking into those implementations, I found that they depend on each
Change containing just one of new OR old. They serialize the underlying
value using the serde you provide, along with a single byte that signifies
if the serialized value is the new or old value, which the deserializer
uses on the receiving end to turn it back into a Change(new, null) or
Change(null, old) as appropriate. This is why the repartition topic looks
like it's just the raw data. It basically is, except for the magic byte.

Does that make sense?

Also, I've created https://issues.apache.org/jira/browse/KAFKA-7161 and
https://github.com/apache/kafka/pull/5366 . Do you mind taking a look and
leaving any feedback you have?

Thanks,
-John

On Fri, Jul 13, 2018 at 12:00 PM Vasily Sulatskov 
wrote:

> Hi John,
>
> Thanks for your explanation.
>
> I have an answer to the practical question, i.e. a null aggregator
> value should be interpreted as a fatal application error.
>
> On the other hand, looking at the app topology, I see that a message
> from KSTREAM-REDUCE-02 / "table" goes goes to
> KTABLE-SELECT-06 which in turn forwards data to
> KSTREAM-SINK-07 (topic: aggregated-table-repartition), and at
> this point I assume that data goes back to kafka into a *-repartition
> topic, after that the message is read from kafka by
> KSTREAM-SOURCE-08 (topics: [aggregated-table-repartition]),
> and finally gets to Processor: KTABLE-REDUCE-09 (stores:
> [aggregated-table]), where the actual aggregation takes place. What I
> don't get is where this Change value comes from, I mean if it's been
> produced by KSTREAM-REDUCE-02, but it shouldn't matter as the
> message goes through kafka where it gets serialized, and looking at
> kafka "repartition" topic, it contains regular values, not a pair of
> old/new.
>
> As far as I understand, Change is a purely in-memory representation of
> the state for a particular key, and at no point it's serialized back
> to kafka, yet somehow this Change values makes it to reducer. I feel
> like I am missing something here. Could you please clarify this?
>
> Can you please point me to a place in kafka-streams sources where a
> Change of newValue/oldValue is produced, so I could take a look? I
> found KTableReduce implementation, but can't find who makes these
> Change value.
> On Fri, Jul 13, 2018 at 6:17 PM John Roesler  wrote:
> >
> > Hi again Vasily,
> >
> > Ok, it looks to me like this behavior is the result of the un-clean
> > topology change.
> >
> > Just in case you're interested, here's what I think happened.
> >
> > 1. Your reduce node in subtopology1 (KSTREAM-REDUCE-02 / "table"
> )
> > internally emits pairs of "oldValue"/"newValue" . (side-note: It's by
> > forwarding both the old and new value that we are able to maintain
> > aggregates using the subtractor/adder pairs)
> >
> > 2. In the full topology, these old/new pairs go through some
> > transformations, but still in some form eventually make their way down to
> > the reduce node (KTABLE-REDUCE-09/"aggregated-table").
> >
> > 3. The reduce processor logic looks like this:
> > final V oldAgg = store.get(key);
> > V newAgg = oldAgg;
> >
> > // first try to add the new value
> > if (value.newValue != null) {
> > if (newAgg == null) {
> > newAgg = value.newValue;
> > } else {
> > newAgg = addReducer.apply(newAgg, value.newValue);
> > }
> > }
> >
> > // then try

Re: Kafka Streams - Merge vs. Join

2018-08-09 Thread John Roesler
Hi John,

Sorry for the confusion! I just noticed that we failed to document the
merge operator. I've created
https://issues.apache.org/jira/browse/KAFKA-7269 to fix it.

But in the mean time,
* merge: interleave the records from two streams to produce one collated
stream
* join: compute a new stream by fusing together records from the two inputs
by key

For example:
input-1:
(A, 1)
(B, 2)

input-2:
(A, 500)
(C, 60)

join( (l,r) -> new KeyValue(l, r) ):// simplified API
(A, (1, 500) )
(B, (2, null) )
(C, (null, 60) )

merge:
(A, 1)
(A, 500)
(B, 2)
(C, 60)


Does that make sense?
-John Roesler

On Thu, Aug 9, 2018 at 2:13 PM  wrote:

> Hi All,
>
>
>
>   I am a little confused on the difference between the
> KStreamBuilder merge() function and doing a KStream-to-KStream Join
> operation. I understand the difference between Inner, Left and Outer joins,
> but I don't understand exactly what the difference is between the two. It
> appears to me that both ways would merge two streams into a single stream,
> but the joins do have the ability to remove duplicate data. Is that the
> only difference? Also, on a side note, I am really clueless as to what the
> difference between Windowed and Windowless means when referring to the
> joins.
>
>
>
>   Any help would be greatly appreciated. Thank you.
>
>
>
> John Heller
>
>


Re: Controlled topics creation in Kafka cluster, how does that affect a Kafka Streams App that uses join?

2018-09-05 Thread John Roesler
Hi Meeiling,

It sounds like "formal process" is more like "fill out a form beforehand"
than just "access controlled".

In that case, although it's somewhat of an internal detail, I think the
easiest thing would be to figure out what the name of the internal topics
will be and request their creation before you run the app.

The names of the internal topics is predictable, based on the application
id and the app topology, and it will be stable as long as you don't change
the app.

I think the simplest approach would be to run it locally and just see which
topics get created.

Does that help?
-John

On Tue, Sep 4, 2018 at 10:07 PM Meeiling.Bradley <
meeiling.brad...@target.com> wrote:

> Would I be able to run a Kafka streams application contains join
> KStream-KStream that consumes and produces to a Kafka cluster that has a
> formal process to provision topics creation? Since there are behind scene
> creation of  and  topics need to be
> created for join operation. These topics are created as kafka streams app
> executes, if topics creation is a controlled process, is there a way to
> bypass that while my application is running?
>
> Meeiling Bradley  |  Lead BI Engineer  |  • EDABI  |   33 South 6th
> Street, CC-2002  |  Minneapolis, MN 55402
>
>
>
>


Re: Best way to manage topics using Java API

2018-09-05 Thread John Roesler
Hi Robin,

The AdminClient is what you want.

"Evolving" is just a heads-up that the API is relatively new and hasn't
stood the test of time, so you shouldn't be *too* surprised to see it
change in the future.
That said, even APIs marked "Evolving" need to go through the KIP process
to be changed, and would almost certainly be deprecated for a while before
being removed.

Does this help?
-John

On Mon, Sep 3, 2018 at 7:57 AM Robin Perice 
wrote:

> Hi,
>
> I'm currently using *kafka_2.11* *1.0.0***and I want to list, create,
> delete topics directly in from my Java code.
>
> When searching on stackoverflow, all examples are using *AdminUtils
> *(kafka.admin.AdminUtils) which is not really documented.
>
> I found in the documentation that I should use *AdminClient *API
> (http://kafka.apache.org/10/documentation.html#adminapi), which is well
> documentated. But this interface stability is marked as 'Evolving' (I
> don't think we are going to update the Kafka version soon on my project).
>
> So, I'm asking what is the better way to manage topics from Java ?
>
>
> Regards,
>
> Robin
>
> *
> *
>
> *--*
> Robin PERICE
> INGENIEUR DEVELOPPEMENT LOGICIEL
> Tel.: +33 (0) 5 62 88 80 39
>
>


Re: Timing state changes?

2018-09-06 Thread John Roesler
Hi Tim,

>From your spec, I think that Kafka Streams has several ways to support
either scenario. Caveat: this is off the cuff, so I might have missed
something. For your context, I'll give you some thoughts on several ways
you could do it, with various tradeoffs.

Scenario 1 (every widget reports its current (connected/disconnected) state
every 30s):

Windowed with final results:
The easiest implementation would depend on a feature we have planned for
version 2.1:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-328%3A+Ability+to+suppress+updates+for+KTables
For this, you can do a sliding 5-minute window (advance by 30s). Since all
widgets "phone in" during the window, you can just count the number of
"connected" messages and alert at the end of the window for any widget
whose count is 0. The "default" mode for stream processing is to
continuously emit updates to the windowed aggregation, so a widget that is
disconnected at the start of the window would be reported as 0 first, then
transition to 1 when it connects. To avoid false positives, you'll want to
"suppress" the intermediate results, which is not available currently but
I'm working on it for 2.1 .

As a stop-gap solution until final results are available, you can still do
sliding windows, but with a longer window size, say a 6-minute window,
advanced by 30s. Instead of a count, you can do a reduce that sets the
value to the time of the last connection. Transforming this to the time
*since* the last connection, you'll have an opportunity to send an alert if
a widget was last connected more than 5 minutes ago. This won't cause false
positives, but it might result in double-reporting. This can be a little
more compact that the count-based one, since it only needs to store alert
candidates (widgets that are currently connected don't need to be stored).
So once the "final results" feature is released, you can transition this
approach to fix the double-reporting problem by using a 5-minute window and
enabling "final result" mode.

Note that any of the windowed solutions would generate an alert for *every*
window in which the widget hasn't been connected. Assuming some window
stays disconnected, the sliding window would generate a fresh alert every
30s. Instead you could use tumbling windows, which would give you a fresh
alert every 5 minutes. This doesn't sound like what you want.

Scenario 2 (widgets only report when they change state):
As you noticed, a windowed computation won't work here, because you would
be wanting to alert on things that are absent from the window.

Instead, you can use a custom Processor with a Key/Value store and schedule
punctuations to send the alerts. For example, you can store the state and
the time of the transition to that state, and finally whether an alert has
been sent for that widget. You can schedule a punctuation to scan over the
whole store. For any record that's been disconnected for more than 5m and
*not* sent an alert, you send the alert and set the "sent" flag. Since you
only need to consider widgets whose last transition was a disconnect and
that have *not* had an alert sent, you can keep the store pretty compact by
dropping entries when you send the alert or when they transition from
"disconnected to connected". So the store doesn't need to contain any
widget whose state is currently "connected" or who is disconnected and has
already been alerted.

Just considering these two implementations, the custom processor + state
store can be much more compact, since it only needs to store alert
candidates, whereas the windowed computation needs to store the candidates
for each active window (for tumbling windows, this is still just one, but
for sliding windows, it's 5 minutes * 30 seconds = 10 windows).
Also, the custom processor lets you send the alert just once for each
widget that stays disconnected longer than 5m, so this is sounding better
to me.

I hope this helps!
-John

On Thu, Sep 6, 2018 at 11:00 AM Tim Ward  wrote:

> Hi, looking for some beginner's help with architectural choices.
>
> Scenario 1:
>
> A stream of messages arrives on a Kafka topic. Each has the key being a
> widget ID, and the value being an indication of whether the widget is
> connected or disconnected. For each widget a message arrives every 30
> seconds with the current connected state.
>
> Requirement: emit a message when a widget has been reported as
> disconnected for at least five minutes.
>
> Constraints: there can be large numbers of widgets, so horizontal
> scalability is required. The solution must be robust against the
> application, or instances of it, crashing, so persistence or regeneration
> of any local/internal state is required, and committing the input messages
> has to be got right so that they're reprocessed on a restart if and as
> necessary.
>
> Observation: without the constraints we'd just read each message, keep an
> in-memory list of disconnected widgets with the timestamp at which they
> went disconnected, 

Re: SAM Scala aggregate

2018-09-10 Thread John Roesler
In addition to the other suggestions, if you're having too much trouble
with the interface, you can always fall back to creating anonymous
Initializer/Aggregator instances the way you would if programming in Java.
This way, you wouldn't need SAM conversion at all (all it's doing is
turning your functions into an Initializer and Aggregator).

On Mon, Sep 10, 2018 at 8:32 AM Sean Glover 
wrote:

> Hi,
>
> SAM conversions can be enabled in 2.11 with the -Xexperimental to scalac.
> However, this version of SAM conversions isn't recommended for production
> and was significantly refactored for 2.12.  When we contributed this API to
> Kafka we explicitly removed the SAM conversions because we didn't want to
> require the Kafka build (or end user builds) to enable this experimental
> feature.
>
> Echoing what Debasish has already said, the lightbend/kafka-streams-scala
> project is deprecated.  If the app your building is destined for production
> you should use the Kafka 2.0+ client libraries which are backwards
> compatible with older versions of the brokers; you don't need to get
> everyone else to upgrade their clients or your cluster.
>
> Regards
> Sean
>
> On Sun, Sep 9, 2018 at 3:59 PM Michael Eugene  wrote:
>
> > I’m integrating with lot of other applications and it would be a little
> > cowboy to chose my own version.  I can commend that we all went to 2.0
> but
> > there are already working application me so it won’t be possible to
> upgrade
> > everything that everyone else is doing.  That will have to happen
> sometime
> > though. It’s just not gonna happen this week and my goal is to get
> > something working by tomorrow and I’ll stay up all night if I have to.
> >
> > Sent from my iPhone
> >
> > > On Sep 9, 2018, at 2:11 PM, Matthias J. Sax 
> > wrote:
> > >
> > > Why can't you use Kafka Streams 2.0?
> > >
> > > Note: Kafka Streams is backward compatible and it can connect to older
> > > brokers -- it's not required to upgrade your cluster to use Kafka
> > > Streams 2.0 -- updating you maven/gradle dependency is sufficient.
> > >
> > > Also, AFAIK SAM conversions are only available in Scala 2.12.
> > >
> > >
> > > -Matthias
> > >
> > >> On 9/9/18 11:37 AM, Michael Eugene wrote:
> > >> I can’t do Kafka 2.0. I am limited to this version right now.  If I
> > continue to struggle with it this much, I can eventually do that.
> However,
> > I know other people in the organization have things kafka working with
> > Scala. Probably not a good idea to say it’s a necessity when it’s not
> > completely necessary. Your point is well taken though, I am considering
> it.
> > >>
> > >> Sent from my iPhone
> > >>
> > >>> On Sep 9, 2018, at 1:10 PM, Debasish Ghosh  >
> > wrote:
> > >>>
> > >>> I don't have the environment to run the Scala code right now. Will be
> > >>> tomorrow until I have one ..
> > >>>
> > >>> Now that Scala API is part of the official Kafka distribution. Can u
> > please
> > >>> try that out instead of kafka-streams-scala ? The library is now
> > deprecated
> > >>> and I remember we ran into some SAM related issues with Scala 2.11
> > (which
> > >>> worked fine with 2.12). They were finally fixed in the Kafka
> > distribution -
> > >>> there are some differences in the APIs as well ..
> > >>>
> > >>> regards.
> > >>>
> >  On Sun, Sep 9, 2018 at 11:32 PM Michael Eugene 
> > wrote:
> > 
> >  I’m using 2.11.11
> > 
> >  Sent from my iPhone
> > 
> > > On Sep 9, 2018, at 12:13 PM, Debasish Ghosh <
> > ghosh.debas...@gmail.com>
> >  wrote:
> > >
> > > Which version of Scala are u using ?
> > >
> > >> On Sun, 9 Sep 2018 at 10:44 AM, Michael Eugene <
> far...@hotmail.com>
> >  wrote:
> > >>
> > >> Hi,
> > >>
> > >> I am using kafak-sreams-scala
> > >> https://github.com/lightbend/kafka-streams-scala, and I am trying
> > to
> > >> implement something very simple and I am getting a compilation
> > error by
> >  the
> > >> "aggregate" method. The error is "Cannot resolve overload method
> > >> 'aggregate'" and "Unspecified value parameters: materialized:
> > >> Materialized[String, NotInferedVR, KeyValueStore[Bytes,
> > Array[Byte]]]"
> > >> [https://avatars0.githubusercontent.com/u/16247783?s=400&v=4]<
> > >> https://github.com/lightbend/kafka-streams-scala>
> > >>
> > >> GitHub
> > >> - lightbend/kafka-streams-scala: Thin Scala wrapper ...<
> > >> https://github.com/lightbend/kafka-streams-scala>
> > >> github.com
> > >> Note:
> > >> Scala API for Kafka Streams have been accepted for inclusion in
> > Apache
> > >> Kafka. We have been working with the Kafka team since the last
> > couple of
> > >> months working towards meeting the standards and guidelines for
> this
> > >> activity. Lightbend and Alexis Seigneurin have
> > >> contributed this library (with ...
> > >>
> > >>
> > >>
> > >> However when I add a third argument for a Materialized, I g

Re: Timing state changes?

2018-09-12 Thread John Roesler
Hi Tim,

The general approach used by Streams is resilience by wrapping all state
updates in a "changelog topic". That is, when Streams updates a key/value
pair in the state store, it also sends the update to a special topic
associated with that store. The record is only considered "committed" aka
"fully processed" after the write to the changelog topic is acknowledged
(among other writes).

Upon start-up (or restart), we'll attempt to use any on-disk state stores
we find, but if they are missing or corrupted, we'll just rebuild the whole
state store from the changelog topic.

So the durability is actually provided by the kafka brokers, and using
on-disk state stores is an optimization where available. Thus, you should
be fine if you completely lose the disk at any phase of processing (before
or after a disconnect).

However, depending on the size of your state store, you may find that
recovery from the changelog is on the slow side. For this reason, it's
probably a good idea to use stateful sets with K8s if possible.

Does this help?
Thanks,
-John

On Wed, Sep 12, 2018 at 7:50 AM Tim Ward  wrote:

> From: John Roesler 
> > As you noticed, a windowed computation won't work here, because you would
> > be wanting to alert on things that are absent from the window.
>
> > Instead, you can use a custom Processor with a Key/Value store and
> schedule
> > punctuations to send the alerts. For example, you can store the state and
> > the time of the transition to that state, and finally whether an alert
> has
> > been sent for that widget. You can schedule a punctuation to scan over
> the
> > whole store. For any record that's been disconnected for more than 5m and
> > *not* sent an alert, you send the alert and set the "sent" flag. Since
> you
> > only need to consider widgets whose last transition was a disconnect and
> > that have *not* had an alert sent, you can keep the store pretty compact
> by
> > dropping entries when you send the alert or when they transition from
> > "disconnected to connected". So the store doesn't need to contain any
> > widget whose state is currently "connected" or who is disconnected and
> has
> > already been alerted.
>
> Ta, I'll give something like that a try (the other scenario is simpler so
> I'll do the harder one first).
>
> One question: how does resilience work? If for example the application
> crashes after receiving a "disconnected" message but before timing it out?
> Does the preservation of the local data store across application restarts
> just sort all this out for me automagically so I don't have to worry about
> it? (I'll be deploying to Kubernetes, and applications going away and
> restarting at random seems to be a fact of life there.)
>
> Tim Ward
> The contents of this email and any attachment are confidential to the
> intended recipient(s). If you are not an intended recipient: (i) do not
> use, disclose, distribute, copy or publish this email or its contents; (ii)
> please contact the sender immediately; and (iii) delete this email. Our
> privacy policy is available here:
> https://origamienergy.com/privacy-policy/. Origami Energy Limited
> (company number 8619644); Origami Storage Limited (company number 10436515)
> and OSSPV001 Limited (company number 10933403), each registered in England
> and each with a registered office at: Ashcombe Court, Woolsack Way,
> Godalming, GU7 1LQ.
>


Re: KStreams / KSQL processes that span multiple clusters

2018-09-12 Thread John Roesler
Hi Elliot,

This is not currently supported, but I, for one, think it would be awesome.
It's something I have considered tackling in the future.

Feel free to create a Jira ticket asking for it (but please take a minute
to search for preexisting tickets).

Offhand, my #1 concern would be how it works with EOS, but this is probably
a surmountable problem.

As a workaround if you need this property, you could write a "bridge"
application with a consumer connected to A and producer connected to B, and
mirror the input topics from A to B, then process fully on B. And the
output to C would look similar. Come to think of it, you can probably use
Mirror Maker for this.

Thanks,
-John

On Wed, Sep 12, 2018 at 4:38 AM Elliot West  wrote:

> Hello,
>
> Apologies if this is a naïve question, but I'd like to understand if and
> how KStreams and KSQL can deal with topics that reside in more than one
> cluster. For example, is it possible for a single KS[treams|QL] application
> to:
>
>1. Source from a topic on cluster A
>2. Produce/Consume to intermediate topics on cluster B
>3. Sink a final results to cluster C
>
> Or is it the case that all events read/written by such an application must
> exist on the same Kafka cluster?
>
> Thanks,
>
> Elliot.
>


Re: SAM Scala aggregate

2018-09-12 Thread John Roesler
I'm not 100% certain, but you might need to do "import
_root_.scala.collection.JavaConverters._" etc. Sometimes, you run into
trouble with ambiguity if the compiler can't tell if "scala" references the
top-level package or the intermediate package inside Streams.

Hope this helps!
-John

On Wed, Sep 12, 2018 at 3:02 PM Michael Eugene  wrote:

> Hey thanks for the help everyone, I’m gonna use the new scala 2.0
> libraries.  Im getting the craziest errorwhen building this though but I’m
> not a maven expert. I have to use maven right now (not sbt) because I don’t
> own this project at work.  Anyway whenever I add the maven dependency -
> 
>   org.apache.org
>kafka-streams-scala_2.11
>2.0.0
> 
>
> I then get all kinds of errors in this code that already runs. Like if I
> leave the old Scala wrapper Lifht code in there and just bring in this new
> dependency into the Pom file, I can’t even do some import statements.  Like
> when I do “import scala.collection.JavaConverters._” then I get the
> following error - “object ‘language’ is not a member of package
> org.apache.kafka.streams.scala”.
>
> So what can I be checking to make sure that won’t do this?
>
> Thanks!
>
> Sent from my iPhone
>
> > On Sep 10, 2018, at 9:48 PM, Liam Clarke 
> wrote:
> >
> > Hi Michael Eugene,
> >
> > You're correct - you only need to upgrade your Kafka Streams dependencies
> > in your build file. Looking at MVN Repository, the streams lib will
> > implicitly bring in its dependency on kafka-clients, but you can always
> > include your own explicit dependency on it.
> > https://mvnrepository.com/artifact/org.apache.kafka/kafka-streams/2.0.0
> >
> > Kind regards,
> >
> > Liam Clarke
> >
> >> On Tue, Sep 11, 2018 at 1:43 PM, Michael Eugene 
> wrote:
> >>
> >> Well to bring up that kafka to 2.0, do I just need for sbt kafka clients
> >> and kafka streams 2.0 for sbt?  And it doesn't matter if the system is
> not
> >> Kafka 2.0? Upgrading Kafka itself is probably not an option or me right
> now.
> >>
> >>
> >>
> >>
> >>
> >> 
> >> From: John Roesler 
> >> Sent: Monday, September 10, 2018 4:18:24 PM
> >> To: users@kafka.apache.org
> >> Subject: Re: SAM Scala aggregate
> >>
> >> In addition to the other suggestions, if you're having too much trouble
> >> with the interface, you can always fall back to creating anonymous
> >> Initializer/Aggregator instances the way you would if programming in
> Java.
> >> This way, you wouldn't need SAM conversion at all (all it's doing is
> >> turning your functions into an Initializer and Aggregator).
> >>
> >> On Mon, Sep 10, 2018 at 8:32 AM Sean Glover 
> >> wrote:
> >>
> >>> Hi,
> >>>
> >>> SAM conversions can be enabled in 2.11 with the -Xexperimental to
> scalac.
> >>> However, this version of SAM conversions isn't recommended for
> production
> >>> and was significantly refactored for 2.12.  When we contributed this
> API
> >> to
> >>> Kafka we explicitly removed the SAM conversions because we didn't want
> to
> >>> require the Kafka build (or end user builds) to enable this
> experimental
> >>> feature.
> >>>
> >>> Echoing what Debasish has already said, the
> lightbend/kafka-streams-scala
> >>> project is deprecated.  If the app your building is destined for
> >> production
> >>> you should use the Kafka 2.0+ client libraries which are backwards
> >>> compatible with older versions of the brokers; you don't need to get
> >>> everyone else to upgrade their clients or your cluster.
> >>>
> >>> Regards
> >>> Sean
> >>>
> >>> On Sun, Sep 9, 2018 at 3:59 PM Michael Eugene 
> >> wrote:
> >>>
> >>>> I’m integrating with lot of other applications and it would be a
> little
> >>>> cowboy to chose my own version.  I can commend that we all went to 2.0
> >>> but
> >>>> there are already working application me so it won’t be possible to
> >>> upgrade
> >>>> everything that everyone else is doing.  That will have to happen
> >>> sometime
> >>>> though. It’s just not gonna happen this week and my goal is to get
> >>>> something working by tomorrow and I’ll stay up all night if I have to.
> >>&g

Re: Best way for reading all messages and close

2018-09-14 Thread John Roesler
Specifically, you can monitor the "records-lag-max" (
https://docs.confluent.io/current/kafka/monitoring.html#fetch-metrics)
metric. (or the more granular one per partition).

Once this metric goes to 0, you know that you've caught up with the tail of
the log.

Hope this helps,
-John

On Fri, Sep 14, 2018 at 2:02 PM Matthias J. Sax 
wrote:

> Using Kafka Streams this is a little tricky.
>
> The API itself has no built-in mechanism to do this. You would need to
> monitor the lag of the application, and if the lag is zero (assuming you
> don't write new data into the topic in parallel), terminate the
> application.
>
>
> -Matthias
>
> On 9/14/18 4:19 AM, Henning Røigaard-Petersen wrote:
> > Spin up a consumer, subscribe to EOF events, assign all partitions from
> the beginning, and keep polling until all partitions has reached EOF.
> > Though, if you have concurrent writers, new messages may be appended
> after you observe EOF on a partition, so you are never guaranteed to have
> read all messages at the time you choose to close the consumer.
> >
> > /Henning Røigaard-Petersen
> >
> > -Original Message-
> > From: David Espinosa 
> > Sent: 14. september 2018 09:46
> > To: users@kafka.apache.org
> > Subject: Best way for reading all messages and close
> >
> > Hi all,
> >
> > Although the usage of Kafka is stream oriented, for a concrete use case
> I need to read all the messages existing in a topic and once all them has
> been read then closing the consumer.
> >
> > What's the best way or framework for doing this?
> >
> > Thanks in advance,
> > David,
> >
>
>


Re: Best way for reading all messages and close

2018-09-17 Thread John Roesler
Yep, that should also work!
-John

On Mon, Sep 17, 2018 at 8:36 AM David Espinosa  wrote:

> Thank you all for your responses!
> I also asked this on the confluent slack channel (
> https://confluentcommunity.slack.com) and I got this approach:
>
>1. Query the partitions' high watermark offset
>2. Set the consumer to consume from beginning
>3. Break out when you've reached the high offset
>
> Still have some doubts regarding the implementation, but it seems a good
> approach (I'm using a single partition so a single loop would be enough per
> topic).
> What do you think?
>
> El sáb., 15 sept. 2018 a las 0:30, John Roesler ()
> escribió:
>
> > Specifically, you can monitor the "records-lag-max" (
> > https://docs.confluent.io/current/kafka/monitoring.html#fetch-metrics)
> > metric. (or the more granular one per partition).
> >
> > Once this metric goes to 0, you know that you've caught up with the tail
> of
> > the log.
> >
> > Hope this helps,
> > -John
> >
> > On Fri, Sep 14, 2018 at 2:02 PM Matthias J. Sax 
> > wrote:
> >
> > > Using Kafka Streams this is a little tricky.
> > >
> > > The API itself has no built-in mechanism to do this. You would need to
> > > monitor the lag of the application, and if the lag is zero (assuming
> you
> > > don't write new data into the topic in parallel), terminate the
> > > application.
> > >
> > >
> > > -Matthias
> > >
> > > On 9/14/18 4:19 AM, Henning Røigaard-Petersen wrote:
> > > > Spin up a consumer, subscribe to EOF events, assign all partitions
> from
> > > the beginning, and keep polling until all partitions has reached EOF.
> > > > Though, if you have concurrent writers, new messages may be appended
> > > after you observe EOF on a partition, so you are never guaranteed to
> have
> > > read all messages at the time you choose to close the consumer.
> > > >
> > > > /Henning Røigaard-Petersen
> > > >
> > > > -Original Message-
> > > > From: David Espinosa 
> > > > Sent: 14. september 2018 09:46
> > > > To: users@kafka.apache.org
> > > > Subject: Best way for reading all messages and close
> > > >
> > > > Hi all,
> > > >
> > > > Although the usage of Kafka is stream oriented, for a concrete use
> case
> > > I need to read all the messages existing in a topic and once all them
> has
> > > been read then closing the consumer.
> > > >
> > > > What's the best way or framework for doing this?
> > > >
> > > > Thanks in advance,
> > > > David,
> > > >
> > >
> > >
> >
>


Re: Low level kafka consumer API to KafkaStreams App.

2018-09-17 Thread John Roesler
Hey Praveen,

I also suspect that you can get away with far fewer threads. Here's the
general starting point I recommend:

* start with just a little over 1 thread per hardware thread (accounting
for cores and hyperthreading). For example, on my machine, I have 4 cores
with 2 threads of execution each, so I would configure the application with
8 or maybe 9 threads. Much more than that introduces a *lot* of CPU/memory
overhead in exchange for not much gain (if any).
* choose a number of partitions that would allow you to scale up to a
reasonable number of machines, with respect to the numbers you get above.

>From there, take a close look at all your important machine metrics (cpu,
memory, disk, network) as well as processing metrics (task throughput (how
long your application code takes), end-to-end processing throughput (how
long the full processing lifecycle takes, including the broker roundtrips)).

If there's any resource not saturated, you can tweak various configurations
to try and saturate it. I would think that stuff like buffer size and batch
size would be more helpful with less overhead than number of threads.

But keep a close look at your throughputs each time you make a change, to
be sure you're not locally optimizing at the expense of global performance.

I hope this helps!
-John

On Thu, Sep 13, 2018 at 4:53 PM Svante Karlsson 
wrote:

> You are doing something wrong if you need 10k threads to produce 800k
> messages per second. It feels you are a factor of 1000 off. What size are
> your messages?
>
> On Thu, Sep 13, 2018, 21:04 Praveen  wrote:
>
> > Hi there,
> >
> > I have a kafka application that uses kafka consumer low-level api to help
> > us process data from a single partition concurrently. Our use case is to
> > send out 800k messages per sec. We are able to do that with 4 boxes using
> > 10k threads and each request taking 50ms in a thread. (1000/50*1*4)
> >
> > I understand that kafka in general uses partitions as its parallelism
> > model. It is my understanding that if I want the exact same behavior with
> > kafka streams, I'd need to create 40k partitions for this topic. Is that
> > right?
> >
> > What is the overhead on creating thousands of partitions? If we end up
> > wanting to send out millions of messages per second, is increasing the
> > partitions the only way?
> >
> > Best,
> > Praveen
> >
>


Re: [KafkaStreams 1.1.1] partition assignment broken?

2018-10-08 Thread John Roesler
Hi Bart,

This sounds a bit surprising. Is there any chance you can zip up some logs
so we can see the assignment protocol on the nodes?

Thanks,
-John

On Mon, Oct 8, 2018 at 4:32 AM Bart Vercammen  wrote:

> Hi,
>
> I recently moved some KafkaStreams applications from v0.10.2.1 to v1.1.1
> and now I notice a weird behaviour in the partition assignment.
> When starting 4 instances of my Kafka Streams application (on v1.1.1) I see
> that 17 of the 20 partitions (of a source topic) are assigned to 1 instance
> of the application while the other 3 instances only get 1 partition
> assigned. (previously (on v0.10.2.1) the all got 5 partitions.)
>
> Is this expected behaviour, as I read that quite some improvements were
> done in the partition assignment strategy for Kafka Streams applications?
> If yes, how can I make it so that the partitions are equally devided again
> across all running applications?   It's a bit weird in my opinion as this
> makes scaling the application very hard.
>
> Also, when initially starting with 1 instance of the application, and
> gradually scaling up, the new instances only get 1 partition assigned ...
>
> All my Streams applications use default configuration (more or less),
> running 1 stream-thread.
>
> Any suggestions / enlightenments on this?
> Greets,
> Bart
>


Re: [KafkaStreams 1.1.1] partition assignment broken?

2018-10-08 Thread John Roesler
Hi Bart,

I suspected it might not be feasible to just dump your production logs onto
the internet.

A repro would be even better, but I bet it wouldn't show up when you try
and reproduce it. Good luck!

If the repro doesn't turn out, maybe you could just extract the assignment
lines from your logs?

Thanks,
-John

On Mon, Oct 8, 2018 at 1:24 PM Bart Vercammen  wrote:

> Hi John,
>
> Zipping up some logs from our running Kafka cluster is going to be a bit
> difficult.
> What I can do is try to reproduce this off-line and capture the logs from
> there.
>
> We also had a look in the PartitionAssignor source code (for 1.1.1) and
> indeed this behaviour is a bit weird
> as from the source code I'd expect equally divided partitions.
>
> Anyway, hopefully I'll be able to reproduce this issue with some simple
> unit-test like code.
> I'll post the results when I have more info.
>
> Greets,
> Bart
>
> On Mon, Oct 8, 2018 at 7:36 PM John Roesler  wrote:
>
> > Hi Bart,
> >
> > This sounds a bit surprising. Is there any chance you can zip up some
> logs
> > so we can see the assignment protocol on the nodes?
> >
> > Thanks,
> > -John
> >
> > On Mon, Oct 8, 2018 at 4:32 AM Bart Vercammen  wrote:
> >
> > > Hi,
> > >
> > > I recently moved some KafkaStreams applications from v0.10.2.1 to
> v1.1.1
> > > and now I notice a weird behaviour in the partition assignment.
> > > When starting 4 instances of my Kafka Streams application (on v1.1.1) I
> > see
> > > that 17 of the 20 partitions (of a source topic) are assigned to 1
> > instance
> > > of the application while the other 3 instances only get 1 partition
> > > assigned. (previously (on v0.10.2.1) the all got 5 partitions.)
> > >
> > > Is this expected behaviour, as I read that quite some improvements were
> > > done in the partition assignment strategy for Kafka Streams
> applications?
> > > If yes, how can I make it so that the partitions are equally devided
> > again
> > > across all running applications?   It's a bit weird in my opinion as
> this
> > > makes scaling the application very hard.
> > >
> > > Also, when initially starting with 1 instance of the application, and
> > > gradually scaling up, the new instances only get 1 partition assigned
> ...
> > >
> > > All my Streams applications use default configuration (more or less),
> > > running 1 stream-thread.
> > >
> > > Any suggestions / enlightenments on this?
> > > Greets,
> > > Bart
> > >
> >
>
>
> --
> Mvg,
> Bart Vercammen
>
>
> clouTrix BVBA
> +32 486 69 17 68
> i...@cloutrix.com
>


Re: Converting a Stream to a Table - groupBy/reduce vs. stream.to/builder.table

2018-10-26 Thread John Roesler
Hi Patrik,

Just to drop one observation in... Streaming to a topic and then consuming
it as a table does create overhead, but so does reducing a stream to a
table, and I think it's actually the same in either case.

They both require a store to collect the table state, and in both cases,
the stores need to have a changelog topic. For the "reduce" version, it's
an internal changelog topic, and for the "topic-to-table" version, the
store can use the intermediate topic as its changelog.

This doesn't address your ergonomic concern, but it seemed worth pointing
out that (as far as I can tell), there doesn't seem to be a difference in
overhead.

Hope this helps!
-John

On Fri, Oct 26, 2018 at 3:27 AM Patrik Kleindl  wrote:

> Hello Matthias,
> thank you for the explanation.
> Streaming back to a topic and consuming this as a KTable does respect the
> null values as deletes, correct? But at the price of some overhead.
> Is there any (historical, technical or emotional;-)) reason that no simple
> one-step stream-to-table operation exists?
> Best regards
> Patrik
>
> > Am 26.10.2018 um 00:07 schrieb Matthias J. Sax :
> >
> > Patrik,
> >
> > `null` values in a KStream don't have delete semantics (it's not a
> > changelog stream). That's why we drop them in the KStream#reduce
> > implemenation.
> >
> > If you want to explicitly remove results for a key from the result
> > KTable, your `Reducer#apply()` implementation must return `null` -- the
> > result of #apply() has changelog/KTable semantics and `null` is
> > interpreted as delete for this case.
> >
> > If you want to use `null` from your KStream to trigger reduce() to
> > delete, you will need to use a surrogate value for this, ie, do a
> > mapValues() before the groupByKey() call, an replace `null` values with
> > the surrogate-delete-marker that you can evaluate in `Reducer#apply()`
> > to return `null` for this case.
> >
> > Hope this helps.
> >
> > -Matthias
> >
> >> On 10/25/18 10:36 AM, Patrik Kleindl wrote:
> >> Hello
> >>
> >> Recently we noticed a lot of warning messages in the logs which pointed
> to
> >> this method (we are running 2.0):
> >>
> >> KStreamReduce
> >> public void process(final K key, final V value) {
> >>// If the key or value is null we don't need to proceed
> >>if (key == null || value == null) {
> >>LOG.warn(
> >>"Skipping record due to null key or value. key=[{}]
> >> value=[{}] topic=[{}] partition=[{}] offset=[{}]",
> >>key, value, context().topic(), context().partition(),
> >> context().offset()
> >>);
> >>metrics.skippedRecordsSensor().record();
> >>return;
> >>}
> >>
> >> This was triggered for every record from a stream with an existing key
> but
> >> a null value which we put through groupBy/reduce to get a KTable.
> >> My assumption was that this was the correct way inside a streams
> >> application to get a KTable but this prevents deletion of records from
> >> working.
> >>
> >> Our alternativ is to send the stream back to a named topic and build a
> new
> >> table from it, but this is rather cumbersome and requires a separate
> topic
> >> which also can't be cleaned up by the streams reset tool.
> >>
> >> Did I miss anything relevant here?
> >> Would it be possible to create a separate method for KStream to achieve
> >> this directly?
> >>
> >> best regards
> >>
> >> Patrik
> >>
> >
>


Re: Converting a Stream to a Table - groupBy/reduce vs. stream.to/builder.table

2018-10-27 Thread John Roesler
Hi again Patrik,

Actually, this is a good question... Can you share some context about why
you need to convert a stream to a table (including nulls as retractions)?

Thanks,
-John

On Fri, Oct 26, 2018 at 5:36 PM Matthias J. Sax 
wrote:

> I don't know your overall application setup. However, a KStream
> semantically models immutable facts and there is not update semantic.
> Thus, it seems semantically questionable, to allow changing the
> semantics from facts to updates (the other way is easier IMHO, and thus
> supported via KTable#toStream()).
>
> Does this make sense?
>
> Having said this: you _can_ write a KStream into a topic an read it back
> as KTable. But it's semantically questionable to do so, IMHO. Maybe it
> makes sense for your specific application, but in general I don't think
> it does make sense.
>
>
> -Matthias
>
> On 10/26/18 9:30 AM, John Roesler wrote:
> > Hi Patrik,
> >
> > Just to drop one observation in... Streaming to a topic and then
> consuming
> > it as a table does create overhead, but so does reducing a stream to a
> > table, and I think it's actually the same in either case.
> >
> > They both require a store to collect the table state, and in both cases,
> > the stores need to have a changelog topic. For the "reduce" version, it's
> > an internal changelog topic, and for the "topic-to-table" version, the
> > store can use the intermediate topic as its changelog.
> >
> > This doesn't address your ergonomic concern, but it seemed worth pointing
> > out that (as far as I can tell), there doesn't seem to be a difference in
> > overhead.
> >
> > Hope this helps!
> > -John
> >
> > On Fri, Oct 26, 2018 at 3:27 AM Patrik Kleindl 
> wrote:
> >
> >> Hello Matthias,
> >> thank you for the explanation.
> >> Streaming back to a topic and consuming this as a KTable does respect
> the
> >> null values as deletes, correct? But at the price of some overhead.
> >> Is there any (historical, technical or emotional;-)) reason that no
> simple
> >> one-step stream-to-table operation exists?
> >> Best regards
> >> Patrik
> >>
> >>> Am 26.10.2018 um 00:07 schrieb Matthias J. Sax  >:
> >>>
> >>> Patrik,
> >>>
> >>> `null` values in a KStream don't have delete semantics (it's not a
> >>> changelog stream). That's why we drop them in the KStream#reduce
> >>> implemenation.
> >>>
> >>> If you want to explicitly remove results for a key from the result
> >>> KTable, your `Reducer#apply()` implementation must return `null` -- the
> >>> result of #apply() has changelog/KTable semantics and `null` is
> >>> interpreted as delete for this case.
> >>>
> >>> If you want to use `null` from your KStream to trigger reduce() to
> >>> delete, you will need to use a surrogate value for this, ie, do a
> >>> mapValues() before the groupByKey() call, an replace `null` values with
> >>> the surrogate-delete-marker that you can evaluate in `Reducer#apply()`
> >>> to return `null` for this case.
> >>>
> >>> Hope this helps.
> >>>
> >>> -Matthias
> >>>
> >>>> On 10/25/18 10:36 AM, Patrik Kleindl wrote:
> >>>> Hello
> >>>>
> >>>> Recently we noticed a lot of warning messages in the logs which
> pointed
> >> to
> >>>> this method (we are running 2.0):
> >>>>
> >>>> KStreamReduce
> >>>> public void process(final K key, final V value) {
> >>>>// If the key or value is null we don't need to proceed
> >>>>if (key == null || value == null) {
> >>>>LOG.warn(
> >>>>"Skipping record due to null key or value. key=[{}]
> >>>> value=[{}] topic=[{}] partition=[{}] offset=[{}]",
> >>>>key, value, context().topic(),
> context().partition(),
> >>>> context().offset()
> >>>>);
> >>>>metrics.skippedRecordsSensor().record();
> >>>>return;
> >>>>}
> >>>>
> >>>> This was triggered for every record from a stream with an existing key
> >> but
> >>>> a null value which we put through groupBy/reduce to get a KTable.
> >>>> My assumption was that this was the correct way inside a streams
> >>>> application to get a KTable but this prevents deletion of records from
> >>>> working.
> >>>>
> >>>> Our alternativ is to send the stream back to a named topic and build a
> >> new
> >>>> table from it, but this is rather cumbersome and requires a separate
> >> topic
> >>>> which also can't be cleaned up by the streams reset tool.
> >>>>
> >>>> Did I miss anything relevant here?
> >>>> Would it be possible to create a separate method for KStream to
> achieve
> >>>> this directly?
> >>>>
> >>>> best regards
> >>>>
> >>>> Patrik
> >>>>
> >>>
> >>
> >
>
>


Re: Converting a Stream to a Table - groupBy/reduce vs. stream.to/builder.table

2018-11-20 Thread John Roesler
Hi again, Patrik,

You'll probably be interested in this recent Jira:
https://issues.apache.org/jira/browse/KAFKA-7658

You have a good point about the overhead of going through an intermediate
topic... I can see how explicit topic management is an operational burden,
and you're also right that the changelog topic only gets read on state
restoration. That was an oversight on my part.

I think that with KAFKA-7658 and https://github.com/apache/kafka/pull/5779,
you'll have two good options in the future.

To solve your problem *right now*, you can circumvent the null filtering by
wrapping the values of your stream. For example, immediately before the
reduce, you could mapValues and wrap the values with Optional. Then, your
reduce function can unwrap the Optional and return null if it's empty. Does
that make sense?

This comes with an important caveat, though, which is part of the
motivation for this roadblock to begin with:
if your incoming data gets repartitioned in your topology, then the order
of records for the key is not deterministic. This would break the semantics
of your reduce-to-latest function, and, indeed, any non-commutative reduce
function.

For example, if you have a topic like:
dummykey1: {realkey: A, value: 4}
dummykey2: {realkey: A, value: 5}

and you do a groupBy( select realkey )
and then reduce( keep latest value)

Then, if dummykey1 and dummykey2 are in different partitions, the result
would be either A:4 or A:5, depending on which input partition processed
first.

We have discussed several times solutions to resolve this issue, but it's
quite complex in the details.

Nevertheless, if you're careful and ensure that you don't have multiple
threads producing the same key into the input topic, and also that you
don't have a repartition in the middle, then this should work for you.

Hope this helps!
-john

On Sun, Nov 18, 2018 at 7:04 PM Guozhang Wang  wrote:

> Hi Patrik,
>
> Thanks for explaining your use case to us. While we can still discuss how
> KStream should interpret null-values in aggregations, one workaround atm:
> if you deduplication logic can be written as a transformValues operation,
> you can do the following:
>
>
> builder.table("source-topic").transformValues(...
> Materialized.as("store-name"))
>
> Note that in a recent PR that we are merging, the source KTable from
> builder.table() would not be materialized if users do not specify a
> materialized store name, only the value-transformed KTable will be
> materialized:
>
> https://github.com/apache/kafka/pull/5779
>
>
> Would that work for you?
>
> Guozhang
>
>
> On Mon, Oct 29, 2018 at 2:08 AM Patrik Kleindl  wrote:
>
> > Hi John and Matthias
> > thanks for the questions, maybe explaining our use case helps a bit:
> > We are receiving CDC records (row-level insert/update/delete) in one
> topic
> > per table. The key is derived from the DB records, the value is null in
> > case of deletes. Those would be the immutable facts I guess.
> > These topics are first streamed through a deduplication Transformer to
> drop
> > changes on irrelevant fields.
> > The results are translated to KTables and joined to each other to
> represent
> > the same result as the SQLs on the database, but faster. At this stage
> the
> > delete/null records matter because if a record gets deleted then we want
> it
> > to drop out of the join too. -> Our reduce-approach produced unexpected
> > results here.
> > We took the deduplication step separately because in some cases we only
> > need the the KStream for processing.
> > If you see a simpler/cleaner approach here I'm open to suggestions, of
> > course.
> >
> > Regarding the overhead:
> > 1) Named topics create management/maintenance overhead because they have
> to
> > be created/treated separately (auto-create is not an option) and be
> > considered in future changes, topology changes/resets and so on. The
> > internal topic removes most of those issues.
> > 2) One of our developers came up with the question if the traffic to/from
> > the broker was actually the same in both scenarios, we expect that the
> same
> > is written to the broker for the named topic as well as the reduce-case,
> > but if the KTable is maintained inside a streams topology, does it have
> to
> > read back everything it sends to the broker or can it keep the table
> > internally? I hope it is understandable what I mean, otherwise I can try
> > the explain it more clearly.
> >
> > best regards
> >
> > Patrik
> >
> >
> > On Sat, 27 Oct 2018 at 23:50, John Roesler  wrote:
> >
> > > Hi again Patrik,
> > >
> > > A

Re: Kafka Streams 2.1.0, 3rd time data lose investigation

2019-01-03 Thread John Roesler
Hi Nitay,

I'm sorry to hear of these troubles; it sounds frustrating.

No worries about spamming the list, but it does sound like this might be
worth tracking as a bug report in Jira.
Obviously, we do not expect to lose data when instances come and go,
regardless of the frequency, and we do have tests in place to verify this.
Of course, you might be exercising something that our tests miss.

Thanks for collating the logs. It really helps to understand what's going
on.

Unfortunately, the red coloring didn't make it through the mailing list, so
I'm not sure which specific line you were referencing as demonstrating data
loss.

Just in case you're concerned about the "Updating StandbyTasks failed"
warnings, they should be fine. It indicates that a thread was unable to
re-use a state store that it had previously been assigned in the past, so
instead it deletes the local data and recreates the whole thing from the
changelog.

The Streams logs that would be really useful to capture are the lifecycle
ones, like

[2018-12-14 17:34:30,326] INFO stream-thread
> [kafka-streams-standby-tasks-75ca0cca-cc0b-4524-843c-2d9d1d555980-StreamThread-1]
> State transition from RUNNING to PARTITIONS_REVOKED
> (org.apache.kafka.streams.processor.internals.StreamThread)



[2018-12-14 17:34:30,326] INFO stream-client
> [kafka-streams-standby-tasks-75ca0cca-cc0b-4524-843c-2d9d1d555980] State
> transition from RUNNING to REBALANCING
> (org.apache.kafka.streams.KafkaStreams)


Also, it would be helpful to see the assignment transitions in line with
the state transitions. Examples:

[2018-12-14 17:34:31,863] DEBUG stream-thread
> [kafka-streams-standby-tasks-75ca0cca-cc0b-4524-843c-2d9d1d555980-StreamThread-1]
> Adding assigned tasks as active: {0_2=[standbyTaskSource1-2],
> 0_5=[standbyTaskSource1-5]}
> (org.apache.kafka.streams.processor.internals.TaskManager)



[2018-12-14 17:34:31,882] DEBUG stream-thread
> [kafka-streams-standby-tasks-75ca0cca-cc0b-4524-843c-2d9d1d555980-StreamThread-1]
> Adding assigned standby tasks {0_4=[standbyTaskSource1-4],
> 0_1=[standbyTaskSource1-1]}
> (org.apache.kafka.streams.processor.internals.TaskManager)



[2018-12-14 17:34:31,885] INFO stream-thread
> [kafka-streams-standby-tasks-75ca0cca-cc0b-4524-843c-2d9d1d555980-StreamThread-1]
> partition assignment took 22 ms.
>current active tasks: [0_2, 0_5]
>current standby tasks: [0_1, 0_4]
>previous active tasks: []
>  (org.apache.kafka.streams.processor.internals.StreamThread)



I look forward to hearing back from you (either with more detailed logs or
just a clarification about how the given logs indicate data loss). A report
about potential data loss is very concerning.

Thanks,
-John

On Sun, Dec 30, 2018 at 9:23 AM Nitay Kufert  wrote:

> Hey everybody,
> We are running Kafka streams in production for the last year or so - we
> currently using the latest version (2.1.0) and we suffered from data lose
> several times before.
> The first time we noticed a data loss, we were able to trace it back to
> Exception that we were getting in the code - which eventually mapped to an
> open bug that the Kafka team is still working on. So the temporary solution
> was to disable the feature that causes the Exception (in this case - it was
> the "exactly_once" semantics) and move to "at_lease_once" semantics + piece
> of code that handles duplications.
> The 2nd time we noticed a data loss, we traced it back to some kind of
> Exception caused by lack of memory. To make a long story short - we hit the
> limit for open files on the machines (a lot of files are used by rocksDB) -
> so increasing the RAM of the machines & increasing the number of allowed
> open files on the OS solved this problem.
>
> Now, we are facing data loss for the 3rd time - this time it seems to
> happen when our Kafka stream instances switch (reproducible - happened 2
> separate times). let me explain:
> We are using a 3rd party company called Spotinst - which basically helps
> you save costs by monitoring the Amazon spot market, and switching between
> instances when they find a cheaper one.
>
> The question is, why would it cause data loss?
> Those are logs I collected and put together in a single timeline, including
> messages from Kafka stream instances (from Kibana), Spotinst (3rd party
> company) & the data in the compacted topic where the data should have been
> kept (basically its a compacted topic behind a reduce function - and it
> seems like the aggregated data was lost and the function was invocated as
> if its the first time its aggregating anything).
> What you are seeing is that Spotinst saw an increase in CPU - and initiated
> an Upscale (2 instances), and shortly after it - 2 instances went down
> (Downscale) as the load was over. In *RED* you can see the actual data loss
> (as observed from the compacted topic)
>
> DATE TIME FACILITY INFO
> 12/25/2018 5:17:03 Spotinst Instances Launched - Autoscaling: Policy Name:
> Scaling Policy-Up, Threshold: 70.0, Value 

Re: KTable.suppress(Suppressed.untilWindowCloses) does not suppress some non-final results when the kafka streams process is restarted

2019-01-07 Thread John Roesler
Hi Peter,

Sorry, I just now have seen this thread.

You asked if this behavior is unexpected, and the answer is yes.
Suppress.untilWindowCloses is intended to emit only the final result,
regardless of restarts.

You also asked how the suppression buffer can resume after a restart, since
it's not persistent.
The answer is the same as for in-memory stores. The state of the store (or
buffer, in this case)
is persisted to a changelog topic, which is re-read on restart to re-create
the exact state prior to shutdown.
"Persistent" in the store nomenclature refers only to "persistent on the
local disk".

Just to confirm your response regarding the buffer size:
While it is better to use the public ("Suppressed.unbounded()") API, yes,
your buffer was already unbounded.

I looked at your custom transfomer, and it looks almost correct to me. The
only flaw seems to be that it only looks
for closed windows for the key currently being processed, which means that
if you have key "A" buffered, but don't get another event for it for a
while after the window closes, you won't emit the final result. This might
actually take longer than the window retention period, in which case, the
data would be deleted without ever emitting the final result.

You said you think it should be possible to get the DSL version working,
and I agree, since this is exactly what it was designed for. Do you mind
filing a bug in the "KAFKA" Jira project (
https://issues.apache.org/jira/secure/Dashboard.jspa)? It will be easier to
keep the investigation organized that way.

In the mean time, I'll take another look at your logs above and try to
reason about what could be wrong.

Just one clarification... For example, you showed
> [pool-1-thread-4] APP Consumed: [c@1545398874000/1545398876000] -> [14,
272, 548, 172], sum: 138902
> [pool-1-thread-4] APP Consumed: [c@1545398874000/1545398876000] -> [14,
272, 548, 172, 596, 886, 780] INSTEAD OF [14, 272, 548, 172], sum: 141164

Am I correct in thinking that the first, shorter list is the "incremental"
version, and the second is the "final" version? I think so, but am confused
by "INSTEAD OF".

Thanks for the report,
-John



On Wed, Dec 26, 2018 at 3:21 AM Peter Levart  wrote:

>
>
> On 12/21/18 3:16 PM, Peter Levart wrote:
> > I also see some results that are actual non-final window aggregations
> > that precede the final aggregations. These non-final results are never
> > emitted out of order (for example, no such non-final result would ever
> > come after the final result for a particular key/window).
>
> Absence of proof is not the proof of absence... And I have later
> observed (using the DSL variant, not the custom Transformer) an
> occurrence of a non-final result that was emited after restart of
> streams processor while the final result for the same key/window had
> been emitted before the restart:
>
> [pool-1-thread-4] APP Consumed: [a@154581526/1545815262000] -> [550,
> 81, 18, 393, 968, 847, 452, 0, 0, 0], sum: 444856
> ...
> ... restart ...
> ...
> [pool-1-thread-4] APP Consumed: [a@154581526/1545815262000] -> [550]
> INSTEAD OF [550, 81, 18, 393, 968, 847, 452, 0, 0, 0], sum: 551648
>
>
> The app logic can not even rely on guarantee that results are ordered
> then. This is really not usable until the bug is fixed.
>
> Regards, Peter
>
>


Re: Kafka Streams 2.1.0, 3rd time data lose investigation

2019-01-07 Thread John Roesler
Hi Nitay,

> I will provide extra logs if it will happen again (I really really hope it
> won't hehe :))

Yeah, I hear you. Reproducing errors in production is a real double-edged
sword!

Thanks for the explanation. It makes sense now.

This may be grasping at straws, but it seems like your frequent rebalances
may be exposing you to this recently reported bug:
https://issues.apache.org/jira/browse/KAFKA-7672

The reporter mentioned one thing that can help identify it, which is that
it prints a message saying that it's going to re-initialize the state
store, followed immediately by a transition to "running". Perhaps you can
check your Streams logs to see if you see anything similar.

Thanks,
-John

On Sat, Jan 5, 2019 at 10:48 AM Nitay Kufert  wrote:

> Hey John,
> Thanks for the response!
>
> I will provide extra logs if it will happen again (I really really hope it
> won't hehe :))
>
> Some clarification regarding the previous mail:
> The only thing that shows the data loss is the messages from the compacted
> topic which I consumed a couple of hours after the I noticed the data loss.
> This compacted topic is an output of my stream application (basically, I am
> using reduce on the same key to SUM the values, and pushing it to the
> compacted topic using ".to")
>
> The only correlation I have for those messages are the timestamps of the
> messages in the compacted topic.
> So I took logs from Spotinst & Kafka Stream instances around the same time
> and shared them here (I know correlation doesn't mean causation but that's
> the only thing I have :) )
>
> The messages showed an ever-increasing value for the specific key I was
> investigating, which was expected.
> The unexpected thing was that suddenly the value started the aggregation
> back at 0 for some reason.
>
> In an effort to understand what's going on, I added logs to the function I
> use for reducing the stream to try and log cases where this thing happens -
> but it didn't log anything.. which makes me think as if the reduce function
> "initialized" itself, meaning it acted as if it was the first message
> (nothing to aggregate the value with - so we just put the value)
>
> In the example I have shared, I have keys in the format cdr_ with
> values which are BigDecimal numbers.
> I could have shared the thousands of messages I consumed from the topic
> before reaching the value 1621.72, It would have looked something like :
> cdr_44334 -> 1619.32
> cdr_44334 -> 1619.72
> cdr_44334 -> 1620.12
> cdr_44334 -> 1620.52
> cdr_44334 -> 1620.92
> cdr_44334 -> 1621.32
> cdr_44334 -> 1621.72
> cdr_44334 -> 0.27
> cdr_44334 -> 0.67
> cdr_44334 -> 1.07
>
> So basically, the only thing that shows the loss is the sudden decrease in
> value in a specific key (I had thousands of keys who lost their value - but
> many many more that didn't lose their value).
> (I am monitoring those changes using datadog, so I know which keys are
> affected and I can investigate them)
>
> Let me know if you need some more details or if you want me to escalate
> this situation to a jira
>
> Thanks again
>
>
>
> On Thu, Jan 3, 2019 at 11:36 PM John Roesler  wrote:
>
> > Hi Nitay,
> >
> > I'm sorry to hear of these troubles; it sounds frustrating.
> >
> > No worries about spamming the list, but it does sound like this might be
> > worth tracking as a bug report in Jira.
> > Obviously, we do not expect to lose data when instances come and go,
> > regardless of the frequency, and we do have tests in place to verify
> this.
> > Of course, you might be exercising something that our tests miss.
> >
> > Thanks for collating the logs. It really helps to understand what's going
> > on.
> >
> > Unfortunately, the red coloring didn't make it through the mailing list,
> so
> > I'm not sure which specific line you were referencing as demonstrating
> data
> > loss.
> >
> > Just in case you're concerned about the "Updating StandbyTasks failed"
> > warnings, they should be fine. It indicates that a thread was unable to
> > re-use a state store that it had previously been assigned in the past, so
> > instead it deletes the local data and recreates the whole thing from the
> > changelog.
> >
> > The Streams logs that would be really useful to capture are the lifecycle
> > ones, like
> >
> > [2018-12-14 17:34:30,326] INFO stream-thread
> > >
> >
> [kafka-streams-standby-tasks-75ca0cca-cc0b-4524-843c-2d9d1d555980-StreamThread-1]
> > > State transition from RUNNING to PARTITIONS_

Re: KTable.suppress(Suppressed.untilWindowCloses) does not suppress some non-final results when the kafka streams process is restarted

2019-01-10 Thread John Roesler
Hi Peter,

Regarding retention, I was not referring to log retention, but to the
window store retention.
Since a new window is created every second (for example), there are in
principle an unbounded
number of windows (the longer the application runs, the more windows there
are, with no end).
However, we obviously can't store an infinite amount of data, so the window
definition includes
a retention period. By default, this is 24 hours. After the retention
period elapses, all of the data
for the window is purged to make room for new windows.

So what I meant was that if you buffer some key "A" in window (Monday
09:00:00) and then get
no further activity for A for over 24 hours, then when you do get that next
event for A, say at
(Tuesday 11:00:00), you'd do the scan but find nothing, since your buffered
state would already
have been purged from the store.

The way I avoided this problem for Suppression was to organize the data by
timestamp instead
of by key, so on *every* update I can search for all the keys that are old
enough and emit them.
I also don't use a window store, so I don't have to worry about the
retention time.

To answer your question about the window store's topic, it configures a
retention time the same
length as the store's retention time, (and they keys are the full windowed
key including the window
start time), so it'll have roughly the same size bound as the store itself.

Back to the process of figuring out what might be wrong with Suppression, I
don't suppose you
would be able to file a Jira and upload a repro program? If not, that's ok.
I haven't been able to
reproduce the bug yet, but it seems like it's happening somewhat
consistently for you, so I should
be able to get it to happen eventually.

Thanks, and sorry again for the troubles.
-John

On Tue, Jan 8, 2019 at 6:48 AM Peter Levart  wrote:

>
>
> On 1/8/19 12:57 PM, Peter Levart wrote:
> > Hi John,
> >
> > On 1/8/19 12:45 PM, Peter Levart wrote:
> >>> I looked at your custom transfomer, and it looks almost correct to
> >>> me. The
> >>> only flaw seems to be that it only looks
> >>> for closed windows for the key currently being processed, which
> >>> means that
> >>> if you have key "A" buffered, but don't get another event for it for a
> >>> while after the window closes, you won't emit the final result. This
> >>> might
> >>> actually take longer than the window retention period, in which
> >>> case, the
> >>> data would be deleted without ever emitting the final result.
> >>
> >> So in DSL case, the suppression works by flushing *all* of the "ripe"
> >> windows in the whole buffer whenever a singe event comes in with
> >> recent enough timestamp regardless of the key of that event?
> >>
> >> Is the buffer shared among processing tasks or does each task
> >> maintain its own private buffer that only contains its share of data
> >> pertaining to assigned input partitions? In case the tasks are
> >> executed on several processing JVM(s) the buffer can't really be
> >> shared, right? In that case a single event can't flush all of the
> >> "ripe" windows, but just those that are contained in the task's part
> >> of buffer...
> >
> > Just a question about your comment above:
> >
> > /"This might actually take longer than the window retention period, in
> > which case, the data would be deleted without ever emitting the final
> > result"/
> >
> > Are you talking about the buffer log topic retention? Aren't log
> > topics configured to "compact" rather than "delete" messages? So the
> > last "version" of the buffer entry for a particular key should stay
> > forever? What are the keys in suppression buffer log topic? Are they a
> > pair of (timestamp, key) ? Probably not since in that case the
> > compacted log would grow indefinitely...
> >
> > Another question:
> >
> > What are the keys in WindowStore's log topic? If the input keys to the
> > processor that uses such WindowStore consist of a bounded set of
> > values (for example user ids), would compacted log of such WindowStore
> > also be bounded?
>
> In case the key of WindowStore log topic is (timestamp, key) then would
> explicitly deleting flushed entries from WindowStore (by putting null
> value into the store) keep the compacted log bounded? In other words,
> does WindowStore log topic support a special kind of "tombstone" message
> that effectively removes the key from the compacted log?
>
> In that case, my custom processor could keep entries in its WindowStore
> for as log as needed, depending on the activity of a particular input
> key...
>
> >
> > Regards, Peter
> >
> >
>
>


Re: KTable.suppress(Suppressed.untilWindowCloses) does not suppress some non-final results when the kafka streams process is restarted

2019-01-14 Thread John Roesler
Hi Peter,

I see your train of thought, but the actual implementation of the
window store is structured differently from your mental model.
Unlike Key/Value stores, we know that the records in a window
store will "expire" on a regular schedule, and also that every single
record will eventually expire. With this in mind, we have implemented
an optimization to avoid a lot of compaction overhead in RocksDB, as
well as saving on range scans.

Instead of storing everything in one database, we open several
databases and bucket windows into them. Then, when windows
expire, we just ignore the records (i.e., the API makes them unreachable,
but we don't actually delete them). Once all the windows in a database
are expired, we just close and delete the whole database. Then, we open
a new one for new windows. If you look in the code, these databases are
called "segments".

Thus, I don't think that you should attempt to use the built-in window
stores
as you described. Instead, it should be straightforward to implement your
own StateStore with a layout that's more favorable to your desired behavior.

You should also be able to set up the change log the way you need as well.
Explicitly removed entities also would get removed from the log as well, if
it's a compacted log.

Actually, what you're describing is *very* similar to the implementation
for suppress. I might actually suggest that you just copy the suppression
implementation and adapt it to your needs, or at the very least, study
how it works. In doing so, you might actually discover the cause of the
bug yourself!

I hope this helps, and thanks for your help,
-John


On Sat, Jan 12, 2019 at 5:45 AM Peter Levart  wrote:

> Hi Jonh,
>
> Thank you very much for explaining how WindowStore works. I have some
> more questions...
>
> On 1/10/19 5:33 PM, John Roesler wrote:
> > Hi Peter,
> >
> > Regarding retention, I was not referring to log retention, but to the
> > window store retention.
> > Since a new window is created every second (for example), there are in
> > principle an unbounded
> > number of windows (the longer the application runs, the more windows
> there
> > are, with no end).
> > However, we obviously can't store an infinite amount of data, so the
> window
> > definition includes
> > a retention period. By default, this is 24 hours. After the retention
> > period elapses, all of the data
> > for the window is purged to make room for new windows.
>
> Right. Would the following work for example:
>
> - configure retention of WindowStore to be "infinite"
> - explicitly remove records from the store when windows are flushed out
> - configure WindowStore log topic for compacting
>
> Something like the following:
>
>  Stores
>  .windowStoreBuilder(
>  Stores.persistentWindowStore(
>  storeName,
>  Duration.of(1000L, ChronoUnit.YEARS), //
> retentionPeriod
>  Duration.ofSeconds(10), // windowSize
>  false
>  ),
>  keySerde, valSerde
>  )
>  .withCachingEnabled()
>  .withLoggingEnabled(
>  Map.of(
>  TopicConfig.CLEANUP_POLICY_CONFIG,
> TopicConfig.CLEANUP_POLICY_COMPACT
>  )
>  );
>
> Would in above scenario:
>
> - the on-disk WindowStore be kept bounded (there could be some very old
> entries in it but majority will be new - depending on the activity of
> particular input keys)
> - the log topic be kept bounded (explicitly removed entries would be
> removed from compacted log too)
>
> I'm moving away from DSL partly because I have some problems with
> suppression (which I hope we'll be able to fix) and partly because the
> DSL can't give me the complicated semantics that I need for the
> application at hand. I tried to capture what I need in a custom
> Transformer here:
>
> https://gist.github.com/plevart/d3f70bee7346f72161ef633aa60dc94f
>
> Your knowledge of how WindowStore works would greatly help me decide if
> this is a workable idea.
>
> >
> > So what I meant was that if you buffer some key "A" in window (Monday
> > 09:00:00) and then get
> > no further activity for A for over 24 hours, then when you do get that
> next
> > event for A, say at
> > (Tuesday 11:00:00), you'd do the scan but find nothing, since your
> buffered
> > state would already
> > have been purged from the store.
>
> Right. That would be the case when WindowStore was configured with
> default retention of 24 hours. A quick question: What does window s

Re: KTable.suppress(Suppressed.untilWindowCloses) does not suppress some non-final results when the kafka streams process is restarted

2019-01-22 Thread John Roesler
Hi Peter,

Just to follow up on the actual bug, can you confirm whether:
* when you say "restart", do you mean orderly shutdown and restart, or
crash and restart?
* have you tried this with EOS enabled? I can imagine some ways that there
could be duplicates, but they should be impossible with EOS enabled.

Thanks for your help,
-John

On Mon, Jan 14, 2019 at 1:20 PM John Roesler  wrote:

> Hi Peter,
>
> I see your train of thought, but the actual implementation of the
> window store is structured differently from your mental model.
> Unlike Key/Value stores, we know that the records in a window
> store will "expire" on a regular schedule, and also that every single
> record will eventually expire. With this in mind, we have implemented
> an optimization to avoid a lot of compaction overhead in RocksDB, as
> well as saving on range scans.
>
> Instead of storing everything in one database, we open several
> databases and bucket windows into them. Then, when windows
> expire, we just ignore the records (i.e., the API makes them unreachable,
> but we don't actually delete them). Once all the windows in a database
> are expired, we just close and delete the whole database. Then, we open
> a new one for new windows. If you look in the code, these databases are
> called "segments".
>
> Thus, I don't think that you should attempt to use the built-in window
> stores
> as you described. Instead, it should be straightforward to implement your
> own StateStore with a layout that's more favorable to your desired
> behavior.
>
> You should also be able to set up the change log the way you need as well.
> Explicitly removed entities also would get removed from the log as well, if
> it's a compacted log.
>
> Actually, what you're describing is *very* similar to the implementation
> for suppress. I might actually suggest that you just copy the suppression
> implementation and adapt it to your needs, or at the very least, study
> how it works. In doing so, you might actually discover the cause of the
> bug yourself!
>
> I hope this helps, and thanks for your help,
> -John
>
>
> On Sat, Jan 12, 2019 at 5:45 AM Peter Levart 
> wrote:
>
>> Hi Jonh,
>>
>> Thank you very much for explaining how WindowStore works. I have some
>> more questions...
>>
>> On 1/10/19 5:33 PM, John Roesler wrote:
>> > Hi Peter,
>> >
>> > Regarding retention, I was not referring to log retention, but to the
>> > window store retention.
>> > Since a new window is created every second (for example), there are in
>> > principle an unbounded
>> > number of windows (the longer the application runs, the more windows
>> there
>> > are, with no end).
>> > However, we obviously can't store an infinite amount of data, so the
>> window
>> > definition includes
>> > a retention period. By default, this is 24 hours. After the retention
>> > period elapses, all of the data
>> > for the window is purged to make room for new windows.
>>
>> Right. Would the following work for example:
>>
>> - configure retention of WindowStore to be "infinite"
>> - explicitly remove records from the store when windows are flushed out
>> - configure WindowStore log topic for compacting
>>
>> Something like the following:
>>
>>  Stores
>>  .windowStoreBuilder(
>>  Stores.persistentWindowStore(
>>  storeName,
>>  Duration.of(1000L, ChronoUnit.YEARS), //
>> retentionPeriod
>>  Duration.ofSeconds(10), // windowSize
>>  false
>>  ),
>>  keySerde, valSerde
>>  )
>>  .withCachingEnabled()
>>  .withLoggingEnabled(
>>  Map.of(
>>  TopicConfig.CLEANUP_POLICY_CONFIG,
>> TopicConfig.CLEANUP_POLICY_COMPACT
>>  )
>>  );
>>
>> Would in above scenario:
>>
>> - the on-disk WindowStore be kept bounded (there could be some very old
>> entries in it but majority will be new - depending on the activity of
>> particular input keys)
>> - the log topic be kept bounded (explicitly removed entries would be
>> removed from compacted log too)
>>
>> I'm moving away from DSL partly because I have some problems with
>> suppression (which I hope we'll be able to fix) and partly because the
>> DSL can't give me the complicated semantics that I need for the
>> application at

Re: KTable.suppress(Suppressed.untilWindowCloses) does not suppress some non-final results when the kafka streams process is restarted

2019-01-24 Thread John Roesler
Hi Peter,

Thanks for the clarification.

When you hit the "stop" button, AFAIK it does send a SIGTERM, but I don't
think that Streams automatically registers a shutdown hook. In our examples
and demos, we register a shutdown hook "outside" of streams (right next to
the code that calls start() ).
Unless I missed something, a SIGTERM would still cause Streams to exit
abruptly, skipping flush and commit. This can cause apparent duplicates *if
you're not using EOS or if you're reading uncommitted transactions*.

The reason is that, upon restart, the suppression buffer can only
"remember" what got sent & committed to its changelog topic before.

The scenario I have in mind is:

...
* buffer state X
...
* flush state X to buffer changelog
...
* commit transaction T0; start new transaction T1
...
* emit final result X (in uncommitted transaction T1)
...
* crash before flushing to the changelog the fact that state X was emitted.
Also, transaction T1 gets aborted, since we crash before committing.
...
* restart, restoring state X again from the changelog (because the emit
didn't get committed)
* start transaction T2
* emit final result X again (in uncommitted transaction T2)
...
* commit transaction T2
...

So, the result gets emitted twice, but the first time is in an aborted
transaction. This leads me to another clarifying question:

Based on your first message, it seems like the duplicates you observe are
in the output topic. When you read the topic, do you configure your
consumer with "read committed" mode? If not, you'll see "results" from
uncommitted transactions, which could explain the duplicates.

Likewise, if you were to attach a callback, like "foreach" downstream of
the suppression, you would see duplicates in the case of a crash. Callbacks
are a general "hole" in EOS, which I have some ideas to close, but that's a
separate topic.

There may still be something else going on, but I'm trying to start with
the simpler explanations.

Thanks again,
-John

Thanks,
-John

On Wed, Jan 23, 2019 at 5:11 AM Peter Levart  wrote:

> Hi John,
>
> Sorry I haven't had time to prepare the minimal reproducer yet. I still
> have plans to do it though...
>
> On 1/22/19 8:02 PM, John Roesler wrote:
> > Hi Peter,
> >
> > Just to follow up on the actual bug, can you confirm whether:
> > * when you say "restart", do you mean orderly shutdown and restart, or
> > crash and restart?
>
> I start it as SpringBoot application from IDEA and then stop it with the
> red square button. It does initiate the shutdown sequence before
> exiting... So I think it is by SIGTERM which initiates JVM shutdown
> hook(s).
>
> > * have you tried this with EOS enabled? I can imagine some ways that
> there
> > could be duplicates, but they should be impossible with EOS enabled.
>
> Yes, I have EOS enabled.
>
> >
> > Thanks for your help,
> > -John
>
> Regards, Peter
>
> >
> > On Mon, Jan 14, 2019 at 1:20 PM John Roesler  wrote:
> >
> >> Hi Peter,
> >>
> >> I see your train of thought, but the actual implementation of the
> >> window store is structured differently from your mental model.
> >> Unlike Key/Value stores, we know that the records in a window
> >> store will "expire" on a regular schedule, and also that every single
> >> record will eventually expire. With this in mind, we have implemented
> >> an optimization to avoid a lot of compaction overhead in RocksDB, as
> >> well as saving on range scans.
> >>
> >> Instead of storing everything in one database, we open several
> >> databases and bucket windows into them. Then, when windows
> >> expire, we just ignore the records (i.e., the API makes them
> unreachable,
> >> but we don't actually delete them). Once all the windows in a database
> >> are expired, we just close and delete the whole database. Then, we open
> >> a new one for new windows. If you look in the code, these databases are
> >> called "segments".
> >>
> >> Thus, I don't think that you should attempt to use the built-in window
> >> stores
> >> as you described. Instead, it should be straightforward to implement
> your
> >> own StateStore with a layout that's more favorable to your desired
> >> behavior.
> >>
> >> You should also be able to set up the change log the way you need as
> well.
> >> Explicitly removed entities also would get removed from the log as
> well, if
> >> it's a compacted log.
> >>
> >> Actually, what you're describing is *very* similar

Re: KTable.suppress(Suppressed.untilWindowCloses) does not suppress some non-final results when the kafka streams process is restarted

2019-01-25 Thread John Roesler
Hi Peter,

Thanks for the replies.

Regarding transactions:
Yes, actually, with EOS enabled, the changelog and the output topics are
all produced with the same transactional producer, within the same
transactions. So it should already be atomic.

Regarding restore:
Streams doesn't put the store into service until the restore is completed,
so it should be guaranteed not to happen. But there's of course no
guarantee that I didn't mess something up. I'll take a hard look at it.

Regarding restoration and offsets:
Your guess is correct: Streams tracks the latest stored offset outside of
the store implementation itself, specifically by writing a file (called a
Checkpoint File) in the state directory. If the file is there, it reads
that offset and restores from that point. If the file is missing, it
restores from the beginning of the stream. So it should "just work" for
you. Just for completeness, there have been several edge cases discovered
where this mechanism isn't completely safe, so in the case of EOS, I
believe we actually disregard that checkpoint file and the prior state and
always rebuild from the earliest offset in the changelog.

Personally, I would like to see us provide the ability to store the
checkpoint inside the state store, so that checkpoint updates are
linearized correctly w.r.t. data updates, but I actually haven't mentioned
this thought to anyone until now ;)

Finally, regarding your prior email:
Yes, I was thinking that the "wrong" output values might be part of
rolled-back transactions and therefore enabling read-committed mode on the
consumer might tell a different story that what you've seen to date.

I'm honestly still baffled about those intermediate results that are
sneaking out. I wonder if it's something specific to your data stream, like
maybe if there is maybe an edge case when two records have exactly the same
timestamp? I'll have to stare at the code some more...

Regardless, in order to reap the benefits of running the app with EOS, you
really have to also set your consumers to read_committed. Otherwise, you'll
be seeing output data from aborted (aka rolled-back) transactions, and you
miss the intended "exactly once" guarantee.

Thanks,
-John

On Fri, Jan 25, 2019 at 1:51 AM Peter Levart  wrote:

> Hi John,
>
> Haven't been able to reinstate the demo yet, but I have been re-reading
> the following scenario of yours....
>
> On 1/24/19 11:48 PM, Peter Levart wrote:
> > Hi John,
> >
> > On 1/24/19 3:18 PM, John Roesler wrote:
> >
> >>
> >> The reason is that, upon restart, the suppression buffer can only
> >> "remember" what got sent & committed to its changelog topic before.
> >>
> >> The scenario I have in mind is:
> >>
> >> ...
> >> * buffer state X
> >> ...
> >> * flush state X to buffer changelog
> >> ...
> >> * commit transaction T0; start new transaction T1
> >> ...
> >> * emit final result X (in uncommitted transaction T1)
> >> ...
> >> * crash before flushing to the changelog the fact that state X was
> >> emitted.
> >> Also, transaction T1 gets aborted, since we crash before committing.
> >> ...
> >> * restart, restoring state X again from the changelog (because the emit
> >> didn't get committed)
> >> * start transaction T2
> >> * emit final result X again (in uncommitted transaction T2)
> >> ...
> >> * commit transaction T2
> >> ...
> >>
> >> So, the result gets emitted twice, but the first time is in an aborted
> >> transaction. This leads me to another clarifying question:
> >>
> >> Based on your first message, it seems like the duplicates you observe
> >> are
> >> in the output topic. When you read the topic, do you configure your
> >> consumer with "read committed" mode? If not, you'll see "results" from
> >> uncommitted transactions, which could explain the duplicates.
>
> ...and I was thinking that perhaps the right solution to the suppression
> problem would be to use transactional producers for the resulting output
> topic AND the store change-log. Is this possible? Does the compaction of
> the log on the brokers work for transactional producers as expected? In
> that case, the sending of final result and the marking of that fact in
> the store change log would together be an atomic operation.
> That said, I think there's another problem with suppression which looks
> like the supression processor is already processing the input while the
> state store has not been fully restored yet or something related... Is
> this guaranteed not

Re: DSL - Deliver through a table and then to a stream?

2019-02-15 Thread John Roesler
Hi Trey,

I think there is a ticket open requesting to be able to re-use the source
topic, so I don't think it's an intentional restriction, just a consequence
of the way the code is structured at the moment.

Is it sufficient to send the update to "calls" and "answered-calls" at the
same time? You could do something like:

val answeredCalls =
 actions.filter { _, action -> action == Actions.ANSWER }
  .join(callsTable) { id, call -> call }  // now a KTable
  .mapValues { call -> doAnswer(call) } // actual answer implementation

answeredCalls.to("calls");
answeredCalls.to("answered-calls");

Does that help?

-John


On Fri, Feb 15, 2019 at 4:18 PM Trey Hutcheson 
wrote:

> For context, imagine I'm building an IVR simulator. Desired workflow:
>
> IVR knows about a ringing call. IVR receives an IPC instruction to answer
> the call. That instruction is realized by sending a message {action=ANSWER}
> to the "actions" topic.
>
> At this point, the system needs to do two things: actually answer the call,
> and then start a recording of the call, in that order. Because of
> implementation peculiarities external to the system, assume that these two
> things cannot be executed together atomically.
>
> So this is what I'd *like* to do (warning, kotlin code, types omitted for
> brevity):
>
> val callsTable = builder.table("calls", ...)
> val actions = builder.stream("actions", ..)
>
> actions.filter { _, action -> action == Actions.ANSWER }
>   .join(callsTable) { id, call -> call }  // now a KTable
>   .mapValues { call -> doAnswer(call) } // actual answer implementation
>   .through("calls") // persist in state store
>   .to("answered-calls") // let other actors in the system know the call was
> answered, such as start the recording process
>
> Now in the current version of the streams library (2.1.0), that little bit
> of topology throws an exception when trying to build it, with a message
> that a source has already been defined for the "calls" topic. So apparently
> the call to .through materializes a view and defines a source, which was
> already defined in the call to builder.table("calls")?
>
> So how do I do what I want? This sequence needs to happen in order. I have
> tried .branch, but that just ends up in a race condition (the thing doing
> to recording has to join to calls table and filter that the call has been
> answered).
>
> I could create a custom processor that forwards to both sinks - but does
> that really solve the problem? And if it did, how do I create a
> KafkaStreams instance from a combination of StreamBuilder and Topology?
>
> Thanks for the insight
> Trey
>


Re: KTable.suppress(Suppressed.untilWindowCloses) does not suppress some non-final results when the kafka streams process is restarted

2019-02-26 Thread John Roesler
Hi again, Peter,

Just to close the loop about the bug in Suppress, we did get the (apparent)
same report from a few other people:
https://issues.apache.org/jira/browse/KAFKA-7895

I also managed to reproduce the duplicate-result behavior, which could
cause it to emit both intermediate results and duplicate final results.

There's a patch for it in the 2.2 release candidate. Perhaps you can try it
out and see if it resolves the issue for you?

I'm backporting the fix to 2.1 as well, but I unfortunately missed the last
2.1 bugfix release.

Thanks,
-John

On Fri, Jan 25, 2019 at 10:23 AM John Roesler  wrote:

> Hi Peter,
>
> Thanks for the replies.
>
> Regarding transactions:
> Yes, actually, with EOS enabled, the changelog and the output topics are
> all produced with the same transactional producer, within the same
> transactions. So it should already be atomic.
>
> Regarding restore:
> Streams doesn't put the store into service until the restore is completed,
> so it should be guaranteed not to happen. But there's of course no
> guarantee that I didn't mess something up. I'll take a hard look at it.
>
> Regarding restoration and offsets:
> Your guess is correct: Streams tracks the latest stored offset outside of
> the store implementation itself, specifically by writing a file (called a
> Checkpoint File) in the state directory. If the file is there, it reads
> that offset and restores from that point. If the file is missing, it
> restores from the beginning of the stream. So it should "just work" for
> you. Just for completeness, there have been several edge cases discovered
> where this mechanism isn't completely safe, so in the case of EOS, I
> believe we actually disregard that checkpoint file and the prior state and
> always rebuild from the earliest offset in the changelog.
>
> Personally, I would like to see us provide the ability to store the
> checkpoint inside the state store, so that checkpoint updates are
> linearized correctly w.r.t. data updates, but I actually haven't mentioned
> this thought to anyone until now ;)
>
> Finally, regarding your prior email:
> Yes, I was thinking that the "wrong" output values might be part of
> rolled-back transactions and therefore enabling read-committed mode on the
> consumer might tell a different story that what you've seen to date.
>
> I'm honestly still baffled about those intermediate results that are
> sneaking out. I wonder if it's something specific to your data stream, like
> maybe if there is maybe an edge case when two records have exactly the same
> timestamp? I'll have to stare at the code some more...
>
> Regardless, in order to reap the benefits of running the app with EOS, you
> really have to also set your consumers to read_committed. Otherwise, you'll
> be seeing output data from aborted (aka rolled-back) transactions, and you
> miss the intended "exactly once" guarantee.
>
> Thanks,
> -John
>
> On Fri, Jan 25, 2019 at 1:51 AM Peter Levart 
> wrote:
>
>> Hi John,
>>
>> Haven't been able to reinstate the demo yet, but I have been re-reading
>> the following scenario of yours
>>
>> On 1/24/19 11:48 PM, Peter Levart wrote:
>> > Hi John,
>> >
>> > On 1/24/19 3:18 PM, John Roesler wrote:
>> >
>> >>
>> >> The reason is that, upon restart, the suppression buffer can only
>> >> "remember" what got sent & committed to its changelog topic before.
>> >>
>> >> The scenario I have in mind is:
>> >>
>> >> ...
>> >> * buffer state X
>> >> ...
>> >> * flush state X to buffer changelog
>> >> ...
>> >> * commit transaction T0; start new transaction T1
>> >> ...
>> >> * emit final result X (in uncommitted transaction T1)
>> >> ...
>> >> * crash before flushing to the changelog the fact that state X was
>> >> emitted.
>> >> Also, transaction T1 gets aborted, since we crash before committing.
>> >> ...
>> >> * restart, restoring state X again from the changelog (because the emit
>> >> didn't get committed)
>> >> * start transaction T2
>> >> * emit final result X again (in uncommitted transaction T2)
>> >> ...
>> >> * commit transaction T2
>> >> ...
>> >>
>> >> So, the result gets emitted twice, but the first time is in an aborted
>> >> transaction. This leads me to another clarifying question:
>> >>
>> >> Based on your first message, it seems like the 

Re: KTable.suppress(Suppressed.untilWindowCloses) does not suppress some non-final results when the kafka streams process is restarted

2019-03-04 Thread John Roesler
Hi Jonathan,

Sorry to hear that the feature is causing you trouble as well, and that the
2.2 release candidate didn't seem to fix it.

I'll try and do a repro based on the code in your SO post tomorrow.

I don't think it's related to the duplicates, but that shutdown error is
puzzling. Can you print the topology (with topology.describe() ) ? This
will tell us what is in task 1 (i.e., *1_*) of your program.

Thanks,
-John

On Fri, Mar 1, 2019 at 11:33 AM Jonathan Santilli <
jonathansanti...@gmail.com> wrote:

> BTW, after stopping the app gracefully (Stream#close()), this error shows
> up repeatedly:
>
> 2019-03-01 17:18:07,819 WARN
> [XXX-116ba7c8-678e-47f7-9074-7d03627b1e1a-StreamThread-1]
> internals.ProcessorStateManager (ProcessorStateManager.java:327) - task
> [0_0] Failed to write offset checkpoint file to
> [/tmp/kafka-stream/XXX/0_0/.checkpoint]
>
> java.io.FileNotFoundException: /tmp/kafka-stream/XXX/0_0/.checkpoint.tmp
> (No such file or directory)
>
> at java.io.FileOutputStream.open0(Native Method) ~[?:1.8.0_191]
>
> at java.io.FileOutputStream.open(FileOutputStream.java:270) ~[?:1.8.0_191]
>
> at java.io.FileOutputStream.(FileOutputStream.java:213)
> ~[?:1.8.0_191]
>
> at java.io.FileOutputStream.(FileOutputStream.java:162)
> ~[?:1.8.0_191]
>
> at org.apache.kafka.streams.state.internals.OffsetCheckpoint.write(
> OffsetCheckpoint.java:79) ~[kafka-streams-2.2.0.jar:?]
>
> at
>
> org.apache.kafka.streams.processor.internals.ProcessorStateManager.checkpoint(
> ProcessorStateManager.java:325) [kafka-streams-2.2.0.jar:?]
>
> at org.apache.kafka.streams.processor.internals.StreamTask.suspend(
> StreamTask.java:599) [kafka-streams-2.2.0.jar:?]
>
> at org.apache.kafka.streams.processor.internals.StreamTask.close(
> StreamTask.java:721) [kafka-streams-2.2.0.jar:?]
>
> at org.apache.kafka.streams.processor.internals.AssignedTasks.close(
> AssignedTasks.java:337) [kafka-streams-2.2.0.jar:?]
>
> at org.apache.kafka.streams.processor.internals.TaskManager.shutdown(
> TaskManager.java:267) [kafka-streams-2.2.0.jar:?]
>
> at
> org.apache.kafka.streams.processor.internals.StreamThread.completeShutdown(
> StreamThread.java:1209) [kafka-streams-2.2.0.jar:?]
>
> at org.apache.kafka.streams.processor.internals.StreamThread.run(
> StreamThread.java:786) [kafka-streams-2.2.0.jar:?]
>
>
> However, I have checked and the folder created starts with: *1_*
>
> ls -lha /tmp/kafka-stream/XXX/1_1
> total 8
> drwxr-xr-x   5 a  b   160B  1 Mar 17:18 .
> drwxr-xr-x  34 a  b   1.1K  1 Mar 17:15 ..
> -rw-r--r--   1 a  b   2.9K  1 Mar 17:18 .checkpoint
> -rw-r--r--   1 a  b 0B  1 Mar 16:05 .lock
> drwxr-xr-x   3 a  b96B  1 Mar 16:43
> KSTREAM-REDUCE-STATE-STORE-05
>
>
>
> Cheers!
> --
> Jonathan
>
>
>
> On Fri, Mar 1, 2019 at 5:11 PM Jonathan Santilli <
> jonathansanti...@gmail.com>
> wrote:
>
> > Hello John, hope you are well.
> > I have tested the version 2.2 release candidate (although I know it has
> > been postponed).
> > I have been following this email thread because I think am experiencing
> > the same issue. I have reported in an email to this list and also all the
> > details are in OS (
> >
> https://stackoverflow.com/questions/54145281/why-do-the-offsets-of-the-consumer-group-app-id-of-my-kafka-streams-applicatio
> > ).
> >
> > After the test, the result is the same as before (at least for my case),
> > already processed records are passed again to the output topic causing
> the
> > data duplication:
> >
> > ...
> > 2019-03-01 16:55:23,808 INFO
> [XXX-116ba7c8-678e-47f7-9074-7d03627b1e1a-StreamThread-1]
> > internals.StoreChangelogReader (StoreChangelogReader.java:221) -
> > stream-thread [XXX-116ba7c8-678e-47f7-9074-7d03627b1e1a-StreamThread-1]
> No
> > checkpoint found for task 1_10 state store
> > KTABLE-SUPPRESS-STATE-STORE-11 changelog
> > XXX-KTABLE-SUPPRESS-STATE-STORE-11-changelog-10 with EOS turned
> on. *Reinitializing
> > the task and restore its state from the beginning.*
> >
> > ...
> >
> >
> > I was hoping for this to be fixed, but is not the case, at least for my
> > case.
> >
> > If you can, please take a look at the question in SO, I was in contact
> > with Matthias about it, he also points me the place where probably the
> > potential but could be present.
> >
> > Please, let me know any thoughts.
> >
> >
> > Cheers!
> > --
> > Jonathan
> >
> >
> > On Tue, Feb 26, 2019 at 5:23 PM John Roesler  wrote:
> >
> >> Hi again, Peter,
> >>
&

Re: KTable.suppress(Suppressed.untilWindowCloses) does not suppress some non-final results when the kafka streams process is restarted

2019-03-05 Thread John Roesler
Hi Jonathan,

Just a quick update: I have not been able to reproduce the duplicates issue
with the 2.2 RC, even with a topology very similar to the one you included
in your stackoverflow post.

I think we should treat this as a new bug. Would you mind opening a new
Jira bug ticket with some steps to reproduce the problem, and also exactly
the behavior you observe?

Thanks,
-John

On Mon, Mar 4, 2019 at 10:41 PM John Roesler  wrote:

> Hi Jonathan,
>
> Sorry to hear that the feature is causing you trouble as well, and that
> the 2.2 release candidate didn't seem to fix it.
>
> I'll try and do a repro based on the code in your SO post tomorrow.
>
> I don't think it's related to the duplicates, but that shutdown error is
> puzzling. Can you print the topology (with topology.describe() ) ? This
> will tell us what is in task 1 (i.e., *1_*) of your program.
>
> Thanks,
> -John
>
> On Fri, Mar 1, 2019 at 11:33 AM Jonathan Santilli <
> jonathansanti...@gmail.com> wrote:
>
>> BTW, after stopping the app gracefully (Stream#close()), this error shows
>> up repeatedly:
>>
>> 2019-03-01 17:18:07,819 WARN
>> [XXX-116ba7c8-678e-47f7-9074-7d03627b1e1a-StreamThread-1]
>> internals.ProcessorStateManager (ProcessorStateManager.java:327) - task
>> [0_0] Failed to write offset checkpoint file to
>> [/tmp/kafka-stream/XXX/0_0/.checkpoint]
>>
>> java.io.FileNotFoundException: /tmp/kafka-stream/XXX/0_0/.checkpoint.tmp
>> (No such file or directory)
>>
>> at java.io.FileOutputStream.open0(Native Method) ~[?:1.8.0_191]
>>
>> at java.io.FileOutputStream.open(FileOutputStream.java:270) ~[?:1.8.0_191]
>>
>> at java.io.FileOutputStream.(FileOutputStream.java:213)
>> ~[?:1.8.0_191]
>>
>> at java.io.FileOutputStream.(FileOutputStream.java:162)
>> ~[?:1.8.0_191]
>>
>> at org.apache.kafka.streams.state.internals.OffsetCheckpoint.write(
>> OffsetCheckpoint.java:79) ~[kafka-streams-2.2.0.jar:?]
>>
>> at
>>
>> org.apache.kafka.streams.processor.internals.ProcessorStateManager.checkpoint(
>> ProcessorStateManager.java:325) [kafka-streams-2.2.0.jar:?]
>>
>> at org.apache.kafka.streams.processor.internals.StreamTask.suspend(
>> StreamTask.java:599) [kafka-streams-2.2.0.jar:?]
>>
>> at org.apache.kafka.streams.processor.internals.StreamTask.close(
>> StreamTask.java:721) [kafka-streams-2.2.0.jar:?]
>>
>> at org.apache.kafka.streams.processor.internals.AssignedTasks.close(
>> AssignedTasks.java:337) [kafka-streams-2.2.0.jar:?]
>>
>> at org.apache.kafka.streams.processor.internals.TaskManager.shutdown(
>> TaskManager.java:267) [kafka-streams-2.2.0.jar:?]
>>
>> at
>>
>> org.apache.kafka.streams.processor.internals.StreamThread.completeShutdown(
>> StreamThread.java:1209) [kafka-streams-2.2.0.jar:?]
>>
>> at org.apache.kafka.streams.processor.internals.StreamThread.run(
>> StreamThread.java:786) [kafka-streams-2.2.0.jar:?]
>>
>>
>> However, I have checked and the folder created starts with: *1_*
>>
>> ls -lha /tmp/kafka-stream/XXX/1_1
>> total 8
>> drwxr-xr-x   5 a  b   160B  1 Mar 17:18 .
>> drwxr-xr-x  34 a  b   1.1K  1 Mar 17:15 ..
>> -rw-r--r--   1 a  b   2.9K  1 Mar 17:18 .checkpoint
>> -rw-r--r--   1 a  b 0B  1 Mar 16:05 .lock
>> drwxr-xr-x   3 a  b96B  1 Mar 16:43
>> KSTREAM-REDUCE-STATE-STORE-05
>>
>>
>>
>> Cheers!
>> --
>> Jonathan
>>
>>
>>
>> On Fri, Mar 1, 2019 at 5:11 PM Jonathan Santilli <
>> jonathansanti...@gmail.com>
>> wrote:
>>
>> > Hello John, hope you are well.
>> > I have tested the version 2.2 release candidate (although I know it has
>> > been postponed).
>> > I have been following this email thread because I think am experiencing
>> > the same issue. I have reported in an email to this list and also all
>> the
>> > details are in OS (
>> >
>> https://stackoverflow.com/questions/54145281/why-do-the-offsets-of-the-consumer-group-app-id-of-my-kafka-streams-applicatio
>> > ).
>> >
>> > After the test, the result is the same as before (at least for my case),
>> > already processed records are passed again to the output topic causing
>> the
>> > data duplication:
>> >
>> > ...
>> > 2019-03-01 16:55:23,808 INFO
>> [XXX-116ba7c8-678e-47f7-9074-7d03627b1e1a-StreamThread-1]
>> > internals.StoreChangelogReader (StoreChangelogReader.java:221) -
>> > stream-thread [XXX-116ba7c8-678e-47f7-9074-7d03627b1e1a-StreamThread-1]
>>

Re: KafkaStreams backoff for non-existing topic

2019-03-25 Thread John Roesler
Hi, Murlio,

I found https://issues.apache.org/jira/browse/KAFKA-7970, which sounds like
the answer is currently "yes". Unfortunately, it is still tricky to handle
this case, although the situation may improve soon.

In the mean time, you can try to work around it with the StateListener.
When Streams has a successful start-up, you'll see it transition from
REBALANCING to RUNNING, so if you see it transition to PENDING_SHUTDOWN,
NOT_RUNNING, or ERROR before you see "oldState: REBALANCING && newState:
RUNNING", you know that Streams did not have a successful startup. It
sounds like you can't determine programmatically *why* this happened, but
you can log a warning or error and then create a new the KafkaStreams
object and try starting it again.

I hope this helps, and feel free to comment on that ticket to add your own
perspective to the issue!

Thanks,
-John

On Fri, Mar 22, 2019 at 3:25 PM Murilo Tavares  wrote:

> Hi
> After some research, I've come to a few discussions, and they all tell me
> that Kafka Streams require the topics to be created before starting the
> application.
> Nevertheless, I'd like my application to keep retrying if a topic does not
> exist.
> I've seen this thread:
> https://groups.google.com/forum/#!topic/confluent-platform/nmfrnAKCM3c,
> which is pretty old, and I'd like to know if it's still hard to catch that
> Exception in my app.
>
> Thanks
> Murilo
>


Re: How to be notified by Kafka stream during partitions rebalancing

2019-04-09 Thread John Roesler
Hi Pierre,

If you're using a Processor (or Transformer), you might be able to use the
`close` method for this purpose. Streams invokes `close` on the Processor
when it suspends the task at the start of the rebalance, when the
partitions are revoked. (It invokes `init` once the rebalance is complete
and the partition is assigned again.)

Does this help?

Thanks,
-John

On Thu, Apr 4, 2019 at 6:39 AM Pierre Coquentin 
wrote:

> thanks :)
>
> On Thu, Apr 4, 2019 at 11:11 AM Dimitry Lvovsky 
> wrote:
>
> > You can detect state changes in your streaming app by implementing
> > KafkaStreams.StateListener,
> > and then registering that with your KafkaStreeams Object e.g new
> > KafkaStreams(...).setStateListener();
> >
> > Hope this helps.
> >
> > On Thu, Apr 4, 2019 at 10:52 AM Pierre Coquentin <
> > pierre.coquen...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > We have a cache in a processor based on assigned partitions and when
> > Kafka
> > > revokes those partitions we would like to flush the cache. The method
> > > punctuate is neet to do that except that as a client of Kafka stream, I
> > am
> > > not notified during a revoke.
> > > I found the same question on StackOverflow
> > >
> > >
> >
> https://stackoverflow.com/questions/51626089/kafka-streams-consumerrebalancelistener-implementation
> > > but
> > > nothing in Jira. Is it plan to develop this kind of feature? Or
> perhaps I
> > > missed something?
> > > Regards,
> > >
> > > Pierre
> > >
> >
>


Re: [Streams] TimeWindows ignores gracePeriodMs in windowsFor(timestamp)

2019-04-30 Thread John Roesler
Hey, Jose,

This is an interesting thought that I hadn't considered before. I think
(tentatively) that windowsFor should *not* take the grace period into
account.

What I'm thinking is that the method is supposed to return  "all windows
that contain the provided timestamp" . When we keep window1 open until
stream time 7, it's because we're waiting to see if some record with a
timestamp in range [0,5) arrives before the overall stream time ticks past
7. But if/when we get that event, its own timestamp is still in the range
[0-5). For example, its timestamp is *not* 6 (because then it would belong
in window2, not window1). Thus, window1 does not "contain" the timestamp 6,
and therefore, windowsFor(6) is not required to return window 1.

Does that seem right to you?
-John

On Thu, Apr 25, 2019 at 6:04 AM Jose Lopez  wrote:

> Hi all,
>
> Given that gradePeriodMs is "the time to admit late-arriving events after
> the end of the window", I'd expect it is taken into account in
> windowsFor(timestamp). E.g.:
>
> sizeMs = 5
> gracePeriodMs = 2
> advanceMs = 3
> timestamp = 6
>
> | window | windowStart | windowEnd | windowsEnd + gracePeriod |
> | 1   | 0   | 5 | 7
>|
> | 2   | 5   | 10   | 12
>  |
> ...
>
> Current output:
> windowsFor(timestamp) returns window 2 only.
>
> Expected output:
> windowsFor(timestamp) returns both window 1 and window 2
>
> Do you agree with the expected output? Am I missing something?
>
> Regards,
> Jose
>


Re: [Streams] TimeWindows ignores gracePeriodMs in windowsFor(timestamp)

2019-04-30 Thread John Roesler
Hi Ashok,

I think some people may be able to give you advice, but please start a new
thread instead of replying to an existing message. This just helps keep all
the messages organized.

Thanks!
-John

On Thu, Apr 25, 2019 at 6:12 AM ASHOK MACHERLA  wrote:

> Hii,
>
> what I asking
>
> I want to know about kafka partitions
>
>
> we have getting data about 200GB+ from sources to kafka for daily .
>
> I need to know how many partitions are required to pull data from source
> without pileup.
>
> please suggest us to fix this issue.
>
> is there any mathematical rules to create specific no.of partitions for
> Topic.???
>
>
> please help me
>
> Sent from Outlook
> 
> From: Jose Lopez 
> Sent: 25 April 2019 16:34
> To: users@kafka.apache.org
> Subject: [Streams] TimeWindows ignores gracePeriodMs in
> windowsFor(timestamp)
>
> Hi all,
>
> Given that gradePeriodMs is "the time to admit late-arriving events after
> the end of the window", I'd expect it is taken into account in
> windowsFor(timestamp). E.g.:
>
> sizeMs = 5
> gracePeriodMs = 2
> advanceMs = 3
> timestamp = 6
>
> | window | windowStart | windowEnd | windowsEnd + gracePeriod |
> | 1   | 0   | 5 | 7
>|
> | 2   | 5   | 10   | 12
>  |
> ...
>
> Current output:
> windowsFor(timestamp) returns window 2 only.
>
> Expected output:
> windowsFor(timestamp) returns both window 1 and window 2
>
> Do you agree with the expected output? Am I missing something?
>
> Regards,
> Jose
>


Re: Changing tumbling windows inclusion

2019-05-07 Thread John Roesler
Hi Alessandro,

Interesting. I agree, messing with the record timestamp to achieve your
goal sounds too messy.

It should be pretty easy to plug in your own implementation of Windows,
instead of using the built-in TimeWindows, if you want slightly different
windowing behavior.

Does that work for you? Feel free to provide more details if you want help
brainstorming.

Thanks,
-John

On Tue, May 7, 2019 at 1:39 AM Alessandro Tagliapietra <
tagliapietra.alessan...@gmail.com> wrote:

> Hello everyone,
>
> I'm trying to window a stream of machine production data, this use case
> needs a message with timestamp ending at the tumbling window end to be
> included in the current window not the next, because the message production
> amount refers to the previous x seconds. This doesn't work because by the
> docs:
>
> Tumbling time windows are aligned to the epoch, with the lower interval
> > bound being inclusive and the upper bound being exclusive
>
>
> is there a way to have the lower bound exclusive and the upper one
> inclusive?
> Another idea is to change our timestamp extractor and remove 1 second from
> the message timestamp but it's not an option I'd like.
>
> Thank you
>
> --
> Alessandro Tagliapietra
>


Re: Customers getting duplicate emails

2019-05-13 Thread John Roesler
Hi Ashok,

In general, what Ryanne said is correct. For example, if you send an
email, and the send times out (but the email actually did get sent),
your app cannot distinguish this from a failure in which the send
times out before the email makes it out. Therefore, your only option
would be to retry, which may result in duplicate messages. This
obviously has nothing to do with Kafka, it's just a fact of life with
distributed systems.

However, there's some possibility that you can use Kafka's guarantees
to reduce your exposure to duplicate emails... But it's hard to say
without more information about the structure of your application. Feel
free to provide more info, and we'll see if we can give you some
advice.

Thanks,
-John

On Fri, May 10, 2019 at 10:11 AM Ryanne Dolan  wrote:
>
> Kafka only supports exactly-once and idempotency within the context of
> streams apps where records are consumed and produced within the same
> cluster. As soon as you touch the outside world in a non-idempotent way,
> e.g. by sending an email, these guarantees fall away. It is essentially
> impossible to guarantee that an intended action occurs no more than once
> while also guaranteeing that no intended action is skipped. This is a
> problem with distributed systems in general.
>
> You might use something like a cache to prevent most spurious emails -- if
> the cache already has record of the email being sent, don't resend it --
> but this will not prevent all duplicates.
>
> Ryanne
>
> On Fri, May 10, 2019 at 7:26 AM ASHOK MACHERLA  wrote:
>
> > *Dear Team*
> >
> >
> >
> > In our project, for SMS/Email purpose we are using Kafka cluster and
> > Real-time Notification which is our custom application.
> >
> >
> >
> > We are sending to messages from *Kafka to Real-time Notification, and
> > then SMTP Gateway* servers.
> >
> >
> >
> > Our problem is ,sometimes customers are getting same email for multiple
> > times.
> >
> > During this time consumer is goes to Rebalancing mode.
> >
> >
> >
> > How to overcome this,
> >
> > Right now, we have 10 partitions for Kafka topic and 10 consumers.
> >
> >
> >
> > Can you please suggest us to fix this one.
> >
> >
> >
> > If you required any information/logs, I’ll share to you .
> >
> >
> >
> > Please help us, Thanks
> >
> >
> >
> >
> >
> > Sent from Mail  for
> > Windows 10
> >
> >
> >


Re: Can kafka internal state be purged ?

2019-06-20 Thread John Roesler
Hi!

In addition to setting the grace period to zero (or some small
number), you should also consider the delays introduced by record
caches upstream of the suppression. If you're closely watching the
timing of records going into and coming out of the topology, this
might also spoil your expectations. You could always disable the
record cache to make the system more predictable (although this would
hurt throughput in production).

Thanks,
-John

On Wed, Jun 19, 2019 at 3:01 PM Parthasarathy, Mohan  wrote:
>
> We do explicitly set the grace period to zero. I am going to try the new 
> version
>
> -mohan
>
>
> On 6/19/19, 12:50 PM, "Parthasarathy, Mohan"  wrote:
>
> Thanks. We will give it a shot.
>
> On 6/19/19, 12:42 PM, "Bruno Cadonna"  wrote:
>
> Hi Mohan,
>
> I realized that my previous statement was not clear. With a grace
> period of 12 hour, suppress would wait for late events until stream
> time has advanced 12 hours before a result would be emitted.
>
> Best,
> Bruno
>
> On Wed, Jun 19, 2019 at 9:21 PM Bruno Cadonna  
> wrote:
> >
> > Hi Mohan,
> >
> > if you do not set a grace period, the grace period defaults to 12
> > hours. Hence, suppress would wait for an event that occurs 12 hour
> > later before it outputs a result. Try to explicitly set the grace
> > period to 0 and let us know if it worked.
> >
> > If it still does not work, upgrade to version 2.2.1 if it is 
> possible
> > for you. We had a couple of bugs in suppress recently that are fixed
> > in that version.
> >
> > Best,
> > Bruno
> >
> > On Wed, Jun 19, 2019 at 8:37 PM Parthasarathy, Mohan 
>  wrote:
> > >
> > > No, I have not set any grace period. Is that mandatory ? Have you 
> seen problems with suppress and windows expiring ?
> > >
> > > Thanks
> > > Mohan
> > >
> > > On 6/19/19, 12:41 AM, "Bruno Cadonna"  wrote:
> > >
> > > Hi Mohan,
> > >
> > > Did you set a grace period on the window?
> > >
> > > Best,
> > > Bruno
> > >
> > > On Tue, Jun 18, 2019 at 2:04 AM Parthasarathy, Mohan 
>  wrote:
> > > >
> > > > On further debugging, what we are seeing is that windows 
> are expiring rather randomly as new messages are being processed. . We tested 
> with new key for every new message. We waited for the window time before 
> replaying new messages. Sometimes a new message would come in and create 
> state. It takes several messages to make some of the old windows to be closed 
> (go past suppress to the next stage). We have also seen where one of them 
> never closed even but several other older ones expired.  Then we explicitly 
> sent a message with the same old key and then it showed up. Also, for every 
> new message, only one of the previous window expires even though there are 
> several pending.
> > > >
> > > > If we don't use suppress, then there is never an issue. 
> With suppress, the behavior we are seeing is weird. We are using 2.1.0 
> version in DSL mode. Any clues on what we could be missing ? Why isn't there 
> an order in the way windows are closed ? As event time progresses by the new 
> messages arriving, the older ones should expire. Is that right understanding 
> or not ?
> > > >
> > > > Thanks
> > > > Mohan
> > > >
> > > > On 6/17/19, 3:43 PM, "Parthasarathy, Mohan" 
>  wrote:
> > > >
> > > > Hi,
> > > >
> > > > We are using suppress in the application. We see some 
> state being created at some point in time. Now there is no new data for a day 
> or two. We send new data but the old window of data (where we see the state 
> being created) is not closing i.e not seeing it go through suppress and on to 
> the next stage. It is as though the state created earlier was purged. Is this 
> possible ?
> > > >
> > > > Thanks
> > > > Mohan
> > > >
> > > >
> > > >
> > >
> > >
>
>
>
>


Re: Kafka streams dropping events in join when reading from earliest message on topic

2019-06-20 Thread John Roesler
Hi!

You might also want to set MAX_TASK_IDLE_MS_CONFIG =
"max.task.idle.ms" to a non-zero value. This will instruct Streams to
wait the configured amount of time to buffer incoming events on all
topics before choosing any records to process. In turn, this should
cause records to be processed in roughly timestamp order across all
your topics. Without it, Streams might run ahead on one of the topics
before processing events from the others.

You _might_ want to set the idle time higher than your
MAX_POLL_INTERVAL_MS_CONFIG = "max.poll.interval.ms", to be sure that
you actually get a chance to poll for more records before giving up on
the idle.

For any operator that is time-dependent, the maximum _observed_
timestamp is considered the current stream time.

I didn't follow the question about commit interval. It's a fixed
configuration, so you can't make it commit more frequently during the
initial catch-up, but then again, why would you want to? It seems like
you'd want the initial load to go as fast as possible, but committing
more frequently will only slow it down.

I hope this helps,
-John

On Thu, Jun 20, 2019 at 9:00 AM giselle.vandon...@klarrio.com
 wrote:
>
> It seems like there were multiple issues:
> 1. The two streams were read in separately:
> val stream1: KStream[String, String] = builder.stream[String, 
> String](Set("topic1"))
> val stream2: KStream[String, String] = builder.stream[String, 
> String](Set("topic2"))
>  instead of together:
> val rawStreams: KStream[String, String] = builder.stream[String, 
> String](Set("topic1", "topic2"))
> This second option got much more output but still not complete.
> 2.  There are twenty partitions per topic. It seems as if it is not reading 
> equally fast from all topic partitions. When printing the input per thread 
> the timestamps do not accumulate nicely across partitions. If time is tracked 
> on a per-record basis. Do you then take the max timestamp across all 
> partitions and topics as the current event time? If so, if you read faster 
> from one partition than from the other when processing old data, what can you 
> do to make sure this does not get discarded besides putting a very high grace 
> period?
>
> My join window is one second, grace period 50ms, retention time is default.
> I use the timestamp inside the observations. But I have tried with the 
> default TimestampExtractor (log append time) as well, which still did not 
> give all the wanted output.
>
> I am also wondering about what to do with the commit interval? In normal 
> cases this should be on 1000 ms but in the case of this initial startup burst 
> it should output faster?
>
> On 2019/06/17 16:47:11, "Matthias J. Sax"  wrote:
> > > I verified keys and timestamps and they match.
> >
> > Did you verify the timestamps client side, ie, in your Streams application?
> >
> > > When is the watermark for the grace period advanced?
> >
> > There is nothing like a watermark. Time is tracked on a per-record basis.
> >
> > > the event time is the Kafka log append time.
> >
> > If it's log append time, that the broker sets the timestamp. Do you use
> > the embedded record timestamp for the join (default)? Or do you have an
> > embedded timestamps in the value and use an custom `TimestampExtractor`?
> >
> > How large is your join-window, what is your grace period and what's your
> > store retention time?
> >
> >
> > -Matthias
> >
> >
> > On 6/17/19 5:24 AM, giselle.vandon...@klarrio.com wrote:
> > > I verified keys and timestamps and they match.  If I start the publisher 
> > > and processor at the same time, the join has entirely correct output with 
> > > 6000 messages coming in and 3000 coming out.
> > > Putting the grace period to a higher value has no effect.
> > > When is the watermark for the grace period advanced? Per commit interval?
> > > I read from 5 Kafka brokers and the event time is the Kafka log append 
> > > time. Could the ordering across brokers have something to do with it?
> > >
> > > On 2019/06/14 18:33:22, "Matthias J. Sax"  wrote:
> > >> How do you know that the result should be 900,000 messages? Did you
> > >> verify that the keys match and that the timestamps are correct?
> > >>
> > >> Did you try to remove grace-period or set a higher value? Maybe there is
> > >> an issue with ouf-of-order data?
> > >>
> > >> -Matthias
> > >>
> > >> On 6/14/19 5:05 AM, giselle.vandon...@klarrio.com wrote:
> > >>> I have two streams of data flowing into a Kafka cluster. I want to 
> > >>> process this data with Kafka streams. The stream producers are started 
> > >>> at some time t.
> > >>>
> > >>> I start up the Kafka Streams job 5 minutes later and start reading from 
> > >>> earliest from both topics (about 900 000 messages already on each 
> > >>> topic). The job parses the data and joins the two streams. On the 
> > >>> intermediate topics, I see that all the earlier 2x90 events are 
> > >>> flowing through until the join. However, only 250 000 are outputted 
>

Re: Can kafka internal state be purged ?

2019-06-21 Thread John Roesler
Sure, the record cache attempts to save downstream operators from
unnecessary updates by also buffering for a short amount of time
before forwarding. It forwards results whenever the cache fills up or
whenever there is a commit. If you're happy to wait at least "commit
interval" amount of time for updates, then you don't need to do
anything, but if you're on the edge of your seat, waiting for these
results, you can set cache.max.bytes.buffering to 0 to disable the
record cache entirely. Note that this would hurt throughput in
general, though.

Just a slight modification:
* a new record with new timestamp > (all the previous timestamps +
grace period) will cause all the old windows *in the same partition*
to close
* yes, expiry of the window depends only on the event time

Hope this helps!
-John

On Thu, Jun 20, 2019 at 11:42 AM Parthasarathy, Mohan  wrote:
>
> Could you tell me a little more about the delays about the record caches and 
> how I can disable it ?
>
>  If I could summarize my problem:
>
> -A new record with a new timestamp > all records sent before, I expect *all* 
> of the old windows to close
> -Expiry of the windows depends only on the event time and not on the key
>
> Are these two statements correct ?
>
> Thanks
> Mohan
>
> On 6/20/19, 9:17 AM, "John Roesler"  wrote:
>
> Hi!
>
> In addition to setting the grace period to zero (or some small
> number), you should also consider the delays introduced by record
> caches upstream of the suppression. If you're closely watching the
> timing of records going into and coming out of the topology, this
> might also spoil your expectations. You could always disable the
> record cache to make the system more predictable (although this would
> hurt throughput in production).
>
> Thanks,
> -John
>
> On Wed, Jun 19, 2019 at 3:01 PM Parthasarathy, Mohan  
> wrote:
> >
> > We do explicitly set the grace period to zero. I am going to try the 
> new version
> >
> > -mohan
> >
> >
> > On 6/19/19, 12:50 PM, "Parthasarathy, Mohan"  wrote:
> >
> > Thanks. We will give it a shot.
> >
> > On 6/19/19, 12:42 PM, "Bruno Cadonna"  wrote:
> >
> > Hi Mohan,
> >
> > I realized that my previous statement was not clear. With a 
> grace
> > period of 12 hour, suppress would wait for late events until 
> stream
> > time has advanced 12 hours before a result would be emitted.
> >
> > Best,
> > Bruno
> >
> > On Wed, Jun 19, 2019 at 9:21 PM Bruno Cadonna 
>  wrote:
> > >
> > > Hi Mohan,
> > >
> > > if you do not set a grace period, the grace period defaults 
> to 12
> > > hours. Hence, suppress would wait for an event that occurs 12 
> hour
> > > later before it outputs a result. Try to explicitly set the 
> grace
> > > period to 0 and let us know if it worked.
> > >
> > > If it still does not work, upgrade to version 2.2.1 if it is 
> possible
> > > for you. We had a couple of bugs in suppress recently that 
> are fixed
> > > in that version.
> > >
> > > Best,
> > > Bruno
> > >
> > > On Wed, Jun 19, 2019 at 8:37 PM Parthasarathy, Mohan 
>  wrote:
> > > >
> > > > No, I have not set any grace period. Is that mandatory ? 
> Have you seen problems with suppress and windows expiring ?
> > > >
> > > > Thanks
> > > > Mohan
> > > >
> > > > On 6/19/19, 12:41 AM, "Bruno Cadonna"  
> wrote:
> > > >
> > > > Hi Mohan,
> > > >
> > > > Did you set a grace period on the window?
> > > >
> > > > Best,
> > > > Bruno
> > > >
> > > > On Tue, Jun 18, 2019 at 2:04 AM Parthasarathy, Mohan 
>  wrote:
> > > > >
> > > > > On further debugging, what we are seeing is that 
> windows are expiring rather randomly as new messages are being processed. . 
> We tested with new key for 

Re: Can kafka internal state be purged ?

2019-06-21 Thread John Roesler
No problem. It's definitely a subtlety. It occurs because each
partition is processed completely independently of the others, so
"stream time" is tracked per partition, and there's no way to look
across at the other partitions to find out what stream time they have.

In general, it's not a problem because you'd expect all partitions to
receive updates over time, but if you're specifically trying to send
events that cause stuff to get flushed from the buffers, it can mess
with you. It's especially notable in tests. So, for most tests, I just
configure the topics to have one partition.

-John

On Fri, Jun 21, 2019 at 3:56 PM Parthasarathy, Mohan  wrote:
>
> That change "In the same partition" must explain what we are seeing. Unless 
> you see one message per partition, all windows will not expire. That is an 
> interesting twist. Thanks for the correction ( I will go back and confirm 
> this.
>
> -mohan
>
>
> On 6/21/19, 12:40 PM, "John Roesler"  wrote:
>
> Sure, the record cache attempts to save downstream operators from
> unnecessary updates by also buffering for a short amount of time
> before forwarding. It forwards results whenever the cache fills up or
> whenever there is a commit. If you're happy to wait at least "commit
> interval" amount of time for updates, then you don't need to do
> anything, but if you're on the edge of your seat, waiting for these
> results, you can set cache.max.bytes.buffering to 0 to disable the
> record cache entirely. Note that this would hurt throughput in
> general, though.
>
> Just a slight modification:
> * a new record with new timestamp > (all the previous timestamps +
> grace period) will cause all the old windows *in the same partition*
> to close
> * yes, expiry of the window depends only on the event time
>
> Hope this helps!
> -John
>
> On Thu, Jun 20, 2019 at 11:42 AM Parthasarathy, Mohan  
> wrote:
> >
> > Could you tell me a little more about the delays about the record 
> caches and how I can disable it ?
> >
> >  If I could summarize my problem:
> >
> > -A new record with a new timestamp > all records sent before, I expect 
> *all* of the old windows to close
> > -Expiry of the windows depends only on the event time and not on the key
> >
> > Are these two statements correct ?
> >
> > Thanks
> > Mohan
> >
> > On 6/20/19, 9:17 AM, "John Roesler"  wrote:
> >
> > Hi!
> >
> > In addition to setting the grace period to zero (or some small
> > number), you should also consider the delays introduced by record
> > caches upstream of the suppression. If you're closely watching the
> > timing of records going into and coming out of the topology, this
> > might also spoil your expectations. You could always disable the
> > record cache to make the system more predictable (although this 
> would
> > hurt throughput in production).
> >
> > Thanks,
> > -John
> >
> > On Wed, Jun 19, 2019 at 3:01 PM Parthasarathy, Mohan 
>  wrote:
> > >
> > > We do explicitly set the grace period to zero. I am going to try 
> the new version
> > >
> > > -mohan
> > >
> > >
> > > On 6/19/19, 12:50 PM, "Parthasarathy, Mohan"  
> wrote:
> > >
> > > Thanks. We will give it a shot.
> > >
> > > On 6/19/19, 12:42 PM, "Bruno Cadonna"  
> wrote:
> > >
> > > Hi Mohan,
> > >
> > > I realized that my previous statement was not clear. With 
> a grace
> > > period of 12 hour, suppress would wait for late events 
> until stream
> > > time has advanced 12 hours before a result would be 
> emitted.
> > >
> > > Best,
> > > Bruno
> > >
> > > On Wed, Jun 19, 2019 at 9:21 PM Bruno Cadonna 
>  wrote:
> > > >
> > > > Hi Mohan,
> > > >
> > > > if you do not set a grace period, the grace period 
> defaults to 12
> > > > hours. Hence, suppress would wait for an event that 
> occurs 12 hour
>   

Re: Can kafka internal state be purged ?

2019-06-24 Thread John Roesler
Hey, this is a very apt question.

GroupByKey isn't a great example because it doesn't actually change
the key, so all the aggregation results are actually on records from
the same partition. But let's say you do a groupBy or a map (or any
operation that can change the key), followed by an aggregation. Now
it's possible that the aggregation would need to process records from
two different partitions. In such a case (key-changing operation
followed by a stateful operation), Streams actually round-trips the
data through an intermediate topic, called a repartition topic, before
the aggregation. This has the effect, similar to the "shuffle" phase
of map-reduce, of putting all the data into its *new* right partition,
so then the aggregation can still process each of its partitions
independently.

Regarding the latter statement, even though you only have one
instance, Streams _still_ processes each partition independently. The
"unit of work" responsible for processing a partition is called a
"task". So if you have 4 partitions, then your one instance actually
has 4 state stores, one for each task, where each task only gets
records from a single partition. The tasks can't see anything about
each other, not their state nor other metadata like their current
stream time. Otherwise, the results would depend on which tasks happen
to be co-located with which other tasks. So, having to send your
"purge" event to all partitions is a pain, but in the end, it buys you
a lot, as you can add another instance to your cluster at any time,
and Streams will scale up, and you'll know that the program is
executing exactly the same way the whole time.

-John

On Sat, Jun 22, 2019 at 4:37 PM Parthasarathy, Mohan  wrote:
>
> I can see the issue. But it raised other questions. Pardon my ignorance. Even 
> though partitions are processed independently, windows can be aggregating 
> state from records read from many partitions. Let us say there is a 
> groupByKey followed by aggregate. In this case how is the state reconciled 
> across all the application instances ? Is there a designated instance for a 
> particular key ?
>
> In my case, there was only one instance processing records from all 
> partitions and it is kind of odd that windows did not expire even though I 
> understand why now.
>
> Thanks
> Mohan
>
>
> On 6/21/19, 2:25 PM, "John Roesler"  wrote:
>
> No problem. It's definitely a subtlety. It occurs because each
> partition is processed completely independently of the others, so
> "stream time" is tracked per partition, and there's no way to look
> across at the other partitions to find out what stream time they have.
>
> In general, it's not a problem because you'd expect all partitions to
> receive updates over time, but if you're specifically trying to send
> events that cause stuff to get flushed from the buffers, it can mess
> with you. It's especially notable in tests. So, for most tests, I just
> configure the topics to have one partition.
>
> -John
>
> On Fri, Jun 21, 2019 at 3:56 PM Parthasarathy, Mohan  
> wrote:
> >
> > That change "In the same partition" must explain what we are seeing. 
> Unless you see one message per partition, all windows will not expire. That 
> is an interesting twist. Thanks for the correction ( I will go back and 
> confirm this.
> >
> > -mohan
> >
> >
> > On 6/21/19, 12:40 PM, "John Roesler"  wrote:
> >
> > Sure, the record cache attempts to save downstream operators from
> > unnecessary updates by also buffering for a short amount of time
> > before forwarding. It forwards results whenever the cache fills up 
> or
> > whenever there is a commit. If you're happy to wait at least "commit
> > interval" amount of time for updates, then you don't need to do
> > anything, but if you're on the edge of your seat, waiting for these
> > results, you can set cache.max.bytes.buffering to 0 to disable the
> > record cache entirely. Note that this would hurt throughput in
> > general, though.
> >
> > Just a slight modification:
> > * a new record with new timestamp > (all the previous timestamps +
> > grace period) will cause all the old windows *in the same partition*
> > to close
> > * yes, expiry of the window depends only on the event time
> >
> > Hope this helps!
> > -John
> >
> > On Thu, Jun 20, 2019 at 11:42 AM Par

Re: replace usage of TimeWindows.until() from Kafka Streams 2.2

2019-06-24 Thread John Roesler
Hey Sendoh,

I think you just overlooked the javadoc in your search, which says:

> @deprecated since 2.1. Use {@link Materialized#withRetention(Duration)} or 
> directly configure the retention in a store supplier and use {@link 
> Materialized#as(WindowBytesStoreSupplier)}.

Sorry for the confusion,
-John

On Mon, Jun 24, 2019 at 5:05 AM unicorn.bana...@gmail.com
 wrote:
>
> Hi Kafka Streams user,
>
> I have this usage of  Kafka Streams and it works well that sets retention 
> time in KTable, both in the internal topics and RocksDB local states.
>
> final KStream eventStream = builder
> .stream("events",
> Consumed.with(Serdes.Integer(), Serdes.String())
> 
> .withOffsetResetPolicy(Topology.AutoOffsetReset.EARLIEST));
>
> eventStream.groupByKey()
> .windowedBy(TimeWindows.of(Duration.ofSeconds(200)).until(Duration.ofSeconds(3000).toMillis()))
> .reduce((oldValue, newValue) -> newValue);
>
> I saw until() is deprecated from 2.2. What would be the replacement of such 
> usage?
> I checked the Materialized related document but cannot find any
>
> Best,
>
> Sendoh


Re: Can kafka internal state be purged ?

2019-06-26 Thread John Roesler
Hi Mohan,

I see where you're going with this, and it might indeed be a
challenge. Even if you send a "dummy" message on all input topics, you
won't have a guarantee that after the repartition, the dummy message
is propagated to all partitions of the repartition topics. So it might
be difficult to force the suppression buffer to flush if it's after a
repartition.

Can we take a step back and discuss the motivation for forcing the
records to flush out? Is this for testing your app, or is it to drive
some production logic?

Thanks,
-John


On Mon, Jun 24, 2019 at 7:26 PM Parthasarathy, Mohan  wrote:
>
> John,
>
> Thanks for the nice explanation. When the repartitioning happens, does the 
> window get associated with the new partition i.e., now does a message with 
> new timestamp has to appear on the repartition topic for the window to expire 
> ? It is possible that there is new stream of messages coming in but post-map 
> operation, the partitions in the repartitioned topic does not see the same 
> thing.
>
> Thanks
> Mohan
>
> On 6/24/19, 7:49 AM, "John Roesler"  wrote:
>
> Hey, this is a very apt question.
>
> GroupByKey isn't a great example because it doesn't actually change
> the key, so all the aggregation results are actually on records from
> the same partition. But let's say you do a groupBy or a map (or any
> operation that can change the key), followed by an aggregation. Now
> it's possible that the aggregation would need to process records from
> two different partitions. In such a case (key-changing operation
> followed by a stateful operation), Streams actually round-trips the
> data through an intermediate topic, called a repartition topic, before
> the aggregation. This has the effect, similar to the "shuffle" phase
> of map-reduce, of putting all the data into its *new* right partition,
> so then the aggregation can still process each of its partitions
> independently.
>
> Regarding the latter statement, even though you only have one
> instance, Streams _still_ processes each partition independently. The
> "unit of work" responsible for processing a partition is called a
> "task". So if you have 4 partitions, then your one instance actually
> has 4 state stores, one for each task, where each task only gets
> records from a single partition. The tasks can't see anything about
> each other, not their state nor other metadata like their current
> stream time. Otherwise, the results would depend on which tasks happen
> to be co-located with which other tasks. So, having to send your
> "purge" event to all partitions is a pain, but in the end, it buys you
> a lot, as you can add another instance to your cluster at any time,
> and Streams will scale up, and you'll know that the program is
> executing exactly the same way the whole time.
>
> -John
>
> On Sat, Jun 22, 2019 at 4:37 PM Parthasarathy, Mohan  
> wrote:
> >
> > I can see the issue. But it raised other questions. Pardon my 
> ignorance. Even though partitions are processed independently, windows can be 
> aggregating state from records read from many partitions. Let us say there is 
> a groupByKey followed by aggregate. In this case how is the state reconciled 
> across all the application instances ? Is there a designated instance for a 
> particular key ?
> >
> > In my case, there was only one instance processing records from all 
> partitions and it is kind of odd that windows did not expire even though I 
> understand why now.
> >
> > Thanks
> > Mohan
> >
> >
> > On 6/21/19, 2:25 PM, "John Roesler"  wrote:
> >
> > No problem. It's definitely a subtlety. It occurs because each
> > partition is processed completely independently of the others, so
> > "stream time" is tracked per partition, and there's no way to look
> > across at the other partitions to find out what stream time they 
> have.
> >
> > In general, it's not a problem because you'd expect all partitions 
> to
> > receive updates over time, but if you're specifically trying to send
> > events that cause stuff to get flushed from the buffers, it can mess
> > with you. It's especially notable in tests. So, for most tests, I 
> just
> > configure the topics to have one partition.
> >
> > -John
> >
> > On Fri, Jun 21, 2019 at 3:56 PM Parthasarathy,

Re: Can kafka internal state be purged ?

2019-06-28 Thread John Roesler
Ok, good, that's what I was hoping. I think that's a good strategy, at
the end of the "real" data, just write a dummy record with the same
keys with a high timestamp to flush everything else through.

For the most part, I'd expect a production program to get a steady
stream of traffic with increasing timestamps, so windows would be
constantly be getting flushed out as stream time moves forward.

Some folks have reported that they don't get enough traffic through
their program to flush out the suppressed results on a regular basis,
though. Right now, the best solution is to have the same Producer that
writes data to the input topics also write "heartbeat/dummy" records
periodically when there is no data to send, just to keep stream time
moving forward. But this isn't a perfect solution, as you have pointed
out in this thread; you really want the heartbeat records to go to all
partitions, and also go to all re-partitions if there are repartition
topics in the topology.

I agree that there seems to be a need for some first-class support for
keeping stream time moving reliably. I think that the ideal would be
to allow Producers to automatically send the heartbeats and to
implement a variant of the Chandy-Lamport distributed snapshot
algorithm to push them though the whole topology while skipping any
actual computations (so your business logic wouldn't have to see
them). I'd really love to see this feature in Streams; I just haven't
written it up yet because I haven't had time.

During the design for Suppress, we did consider having some kind of
timer, but the problem is that it's not possible for this to be
deterministic. If you want the "until window closes" version, you're
supposed to get a guarantee that you'll really only see one, final,
result for each window/key. If we were to use the system clock on the
Streams machine to decide it's probably been long enough and emit the
"final" result, but it turns out that we actually had just stalled
(maybe waiting for quorum during a broker upgrade or something, or
just a run-of-the-mill networking problem) and the next record we poll
was supposed to be in the window, what can we do? We already emitted
the "final" result, so we can say "oops, that wasn't the final result,
_this_ one is the final result", but that seems to render the words
"final result" kind of meaningless. On the other hand, we can just
drop that record and ignore it, but that's a bummer because the only
reason we couldn't include it in the result was some ephemeral
environmental problem. If we run the same data through the same
program again, we'd get a different result.

So for "until window closes" mode, where we guarantee you _only_ see
the final results, we only offer stream time expiration. Anything else
would violate correctness one way or another. On the other hand, you
have "until time limit" mode. In that case, it's just "buffer for a
while, but multiple results are still ok" semantics. For that case, we
have 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-424%3A+Allow+suppression+of+intermediate+events+based+on+wall+clock+time
, to indeed use a timer to emit events if too much time passes on the
Streams side. It's just not implemented yet.

Does this all seem about right to you?
-john

On Wed, Jun 26, 2019 at 12:57 PM Parthasarathy, Mohan  wrote:
>
> Initially it started in the testing. QA reported problems where "events" were 
> not detected after they finished their testing. After this discussion, my 
> proposal was to send a few more records to cause the windows to flush so that 
> the suppressed event would show up. Now it looks to me, these few dummy 
> records have to match the "key" of the pending windows. Then it would be 
> flushed.
>
> In practice, it may not be a problem always. But then the real time nature of 
> the problem might require us that there is not a huge delay between the 
> processing of the event and the flush. How does one solve this issue in 
> production ? I am wondering why the design did not accommodate a timer to 
> flush the windows ?
>
> Thanks
> Mohan
>
>
> On 6/26/19, 8:18 AM, "John Roesler"  wrote:
>
> Hi Mohan,
>
> I see where you're going with this, and it might indeed be a
> challenge. Even if you send a "dummy" message on all input topics, you
> won't have a guarantee that after the repartition, the dummy message
> is propagated to all partitions of the repartition topics. So it might
> be difficult to force the suppression buffer to flush if it's after a
> repartition.
>
> Can we take a step back and discuss the motivation for forcing the
> records to flush out? 

Re: Kafka streams (2.1.1) - org.rocksdb.RocksDBException:Too many open files

2019-06-28 Thread John Roesler
Hey all,

If you want to figure it out theoretically, if you print out the
topology description, you'll have some number of state stores listed
in there. The number of Rocks instances should just be
(#global_state_stores +
sum(#partitions_of_topic_per_local_state_store)) . The number of
stream threads isn't relevant here.

You can also figure it out empirically: the first level of
subdirectories in the state dir are Tasks, and then within that, the
next level is Stores. You should see the store directory names match
up with the stores listed in the topology description. The number of
Store directories is exactly the number of RocksDB instances you have.

There are also metrics corresponding to each of the state stores, so
you can compute it from what you find in the metrics.

Hope that helps,
-john

On Thu, Jun 27, 2019 at 6:46 AM Patrik Kleindl  wrote:
>
> Hi Kiran
> Without much research my guess would be "num_stream_threads *
> (#global_state_stores + sum(#partitions_of_topic_per_local_state_store))"
> So 10 stores (regardless if explicitly defined or implicitely because of
> some stateful operation) with 10 partitions each should result in 100
> Rocksdb instances if you are running at the default of num_stream_threads=1.
>
> As I wrote before, start with 100.
> If the error persists, half the number, if not, double it ;-) Repeat as
> needed.
>
> If you reach the single-digit-range and the error still shows up, start
> searching for any iterators over a store you might not have closed.
>
> br, Patrik
>
> On Thu, 27 Jun 2019 at 13:11, emailtokir...@gmail.com <
> emailtokir...@gmail.com> wrote:
>
> >
> >
> > On 2019/06/27 09:02:39, Patrik Kleindl  wrote:
> > > Hello Kiran
> > >
> > > First, the value for maxOpenFiles is per RocksDB instance, and the number
> > > of those can get high if you have a lot of topic partitions etc.
> > > Check the directory (state dir) to see how many there are.
> > > Start with a low value (100) and see if that has some effect.
> > >
> > > Second, because I just found out, you should use
> > > BlockBasedTableConfig tableConfig = (BlockBasedTableConfig)
> > > options.tableFormatConfig();
> > > tableConfig.setBlockCacheSize(100*1024*1024L);
> > > tableConfig.setBlockSize(8*1024L);
> > > instead of creating a new object to prevent accidently messing up
> > > references.
> > >
> > > Hope that helps
> > > best regards
> > > Patrik
> > >
> > > On Thu, 27 Jun 2019 at 10:46, emailtokir...@gmail.com <
> > > emailtokir...@gmail.com> wrote:
> > >
> > > >
> > > >
> > > > On 2019/06/26 21:58:02, Patrik Kleindl  wrote:
> > > > > Hi Kiran
> > > > > You can use the RocksDBConfigSetter and pass
> > > > >
> > > > > options.setMaxOpenFiles(100);
> > > > >
> > > > > to all RocksDBs for the Streams application which limits how many are
> > > > > kept open at the same time.
> > > > >
> > > > > best regards
> > > > >
> > > > > Patrik
> > > > >
> > > > >
> > > > > On Wed, 26 Jun 2019 at 16:14, emailtokir...@gmail.com <
> > > > > emailtokir...@gmail.com> wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > We are using Kafka streams DSL APIs for doing some counter
> > aggregations
> > > > > > (running on OpenJDK 11.0.2). Our topology has some 400 sub
> > topologies
> > > > & we
> > > > > > are using 8 partitions in source topic. When we start pumping more
> > > > load, we
> > > > > > start getting RockDBException stating "too many open files".
> > > > > >
> > > > > > Here are the stack trace samples:
> > > > > >
> > > > > >
> > > >
> > --
> > > > > > Caused by: org.rocksdb.RocksDBException: while open a file for
> > lock:
> > > > > > PPP.151200/LOCK: Too many open files
> > > > > > at org.rocksdb.RocksDB.open(Native Method)
> > > > > > at org.rocksdb.RocksDB.open(RocksDB.java:235)
> > > > > > at
> > > > > >
> > > >
> > org.apache.kafka.streams.state.internals.RocksDBStore.openDB(RocksDBStore.java:156)
> > > > > > ... 24 common frames omitted
> > > > > >
> > > > > >
> > > > > > Caused by: org.apache.kafka.streams.errors.ProcessorStateException:
> > > > Error
> > > > > > while executing flush from store XXX.151200
> > > > > > at
> > > > > >
> > > >
> > org.apache.kafka.streams.state.internals.RocksDBStore.flushInternal(RocksDBStore.java:397)
> > > > > > at
> > > > > >
> > > >
> > org.apache.kafka.streams.state.internals.RocksDBStore.flush(RocksDBStore.java:388)
> > > > > > at
> > > > > >
> > > >
> > org.apache.kafka.streams.state.internals.Segments.flush(Segments.java:163)
> > > > > > at
> > > > > >
> > > >
> > org.apache.kafka.streams.state.internals.RocksDBSegmentedBytesStore.flush(RocksDBSegmentedBytesStore.java:178)
> > > > > > at
> > > > > >
> > > >
> > org.apache.kafka.streams.state.internals.WrappedStateStore$AbstractStateStore.flush(WrappedStateStore.java:85)
> > > > > > at
> > > > > >
>

Re: Message reprocessing logic

2019-07-09 Thread John Roesler
Hi Alessandro,

Sorry if I'm missing some of the context, but could you just keep
retrying the API call inside a loop? This would block any other
processing by the same thread, but it would allow Streams to stay up
in the face of transient failures. Otherwise, I'm afraid that throwing
an exception is the right thing to do. Streams would re-process the
record in question when it starts back up, but you'd have to re-start
it. You can do that programmatically, but it's a bit heavyweight as a
response to a transient API call failure.

For reference, this is one of several problems that comes up when you
need to call out to external services during processing. Streams
currently lacks full support to make this a really pleasant
experience, but it's a perennial topic of discussion. See
https://cwiki.apache.org/confluence/display/KAFKA/KIP-311%3A+Async+processing+with+dynamic+scheduling+in+Kafka+Streams
and 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-408%3A+Add+Asynchronous+Processing+To+Kafka+Streams
for a couple of attempts to wrestle with the domain.

To answer your latter question, the store should be returned to its
prior state when you restart, but if you want to be absolutely sure
this happens, you need to enable EOS. That will have the side-effect
of discarding any local state after a crash, though, which makes the
"crash and recover" strategy even more heavyweight.

I'd recommend wrapping the API call in a retry loop that's as long as
you can tolerate and then crashing if you still don't get through. Be
sure to also look through the docs and find any heartbeat configs you
need to set. Off the top of my head, I think "max poll interval" at
least needs to be set bigger than your maximum expected pause.
Probably 2x the total retry-loop time would be a good choice.

I hope this helps,
-John

On Fri, Jul 5, 2019 at 6:30 PM Alessandro Tagliapietra
 wrote:
>
> Hello everyone,
>
> I'm looking into a way to reprocess messages in case of soft-errors (not
> exceptions)
> For example we have a topology that does this:
> input stream -> filtering/flatmap -> window and aggregate
>
> in our aggregate step (maybe should be moved into an additional step) we
> make an API call to one of our services.
>
> What I would like to do is to reprocess that message, even better if
> possible just the window computation when the API call fails.
>
> By reading this
> https://docs.confluent.io/current/streams/concepts.html#streams-concepts-processing-guarantees
> if
> I'm not mistaken with the default at least one semantic, if I throw an
> exception the topology will reprocess the messages after the last commit,
> is it possible instead to just soft-retry the last message without throwing
> an exception and possibly reprocess also older correctly processed messages?
>
> Also, if my topology starts from a stream uses multiple stores before
> windowing, if there's an error in the windowing step, what happens to the
> stores changes? When the message is reprocessed, will the store be in the
> state it was after it processed the message on the first try?
>
> Thank you in advance
>
> --
> Alessandro Tagliapietra


Re: Reducing streams startup bandwidth usage

2019-12-02 Thread John Roesler
Hi Alessandro,

I'm sorry to hear that.

The restore process only takes one factor into account: the current offset 
position of the changelog topic is stored in a local file alongside the state 
stores. On startup, the app checks if the recorded position lags the latest 
offset in the changelog. If so, then it reads the missing changelog records 
before starting processing.

Thus, it would not restore any old window data.

There might be a few different things going on to explain your observation:
* if there is more than one instance in your Streams cluster, maybe the task is 
"flopping" between instances, so each instance still has to recover state, 
since it wasn't the last one actively processing it.
* if the application isn't stopped gracefully, it might not get a chance to 
record its offset in that local file, so on restart it has to restore some or 
all of the state store from changelog.

Or it could be something else; that's just what comes to mind.

If you want to get to the bottom of it, you can take a look at the logs, paying 
close attention to which tasks are assigned to which instances after each 
restart. You can also look into the logs from 
`org.apache.kafka.streams.processor.internals.StoreChangelogReader` (might want 
to set it to DEBUG or TRACE level to really see what's happening).

I hope this helps!
-John

On Sun, Dec 1, 2019, at 21:25, Alessandro Tagliapietra wrote:
> Hello everyone,
> 
> we're having a problem with bandwidth usage on streams application startup,
> our current setup does this:
> 
> ...
> .groupByKey()
> .windowedBy(TimeWindows.of(Duration.ofMinutes(1)))
> .aggregate(
> { MetricSequenceList(ArrayList()) },
> { key, value, aggregate ->
> aggregate.getRecords().add(value)
> aggregate
> },
> Materialized.`as` ByteArray>>("aggregate-store").withKeySerde(Serdes.String()).withValueSerde(Settings.getValueSpecificavroSerde())
> )
> .toStream()
> .flatTransform(TransformerSupplier {
> ...
> 
> basically in each window we append the new values and then do some other
> logic with the array of windowed values.
> The aggregate-store changelog topic configuration  uses compact,delete as
> cleanup policy and has 12 hours of retention.
> 
> What we've seen is that on application startup it takes a couple minutes to
> rebuild the state store, even if the state store directory is persisted
> across restarts. That along with an exception that caused the docker
> container to be restarted a couple hundreds times caused a big confluent
> cloud bill compared to what we usually spend (1/4 of a full month in 1 day).
> 
> What I think is happening is that the topic is keeping all the previous
> windows even with the compacting policy because each key is the original
> key + the timestamp not just the key. Since we don't care about previous
> windows as the flatTransform after the toStream() makes sure that we don't
> process old windows (a custom suppressor basically) is there a way to only
> keep the last window so that the store rebuilding goes faster and without
> rebuilding old windows too? Or should I create a custom window using the
> original key as key so that the compaction keeps only the last window data?
> 
> Thank you
> 
> --
> Alessandro Tagliapietra
>


Re: KafkaStreams internal producer order guarantee

2019-12-03 Thread John Roesler
Hi Murilo,

For this case, you don’t have to worry. Kafka Streams provides the guarantee 
you want by default. 

Let us know if you want/need more information!

Cheers,
John

On Tue, Dec 3, 2019, at 08:59, Murilo Tavares wrote:
> Hi Mathias
> Thank you for your feedback.
> I'm still a bit confused about what approach one should take. My
> KafkaStreams application is pretty standard for KafkaStreams: it takes a
> few Table-like topics, group and aggregates some of them so we can join
> with others. Something like this:
> 
> KTable left = builder.table()
> KTable right = builder.table()
> var grouped = right.groupBy(//new key/value).aggregate(...)
> left.leftJoin(grouped, //myFuncion).toStream(...)
> 
> Input and output topics are all Table-like topics, so I understand I need
> "at least once" guarantee, but also need order guarantee at least for the
> same Key. I mean, if you send 2 updates to the same key, I need a guarantee
> I'll have the latest value for that key in the output topic. Is there a
> recommended configuration for this?
> Thanks again
> Murilo
> 
> On Tue, 3 Dec 2019 at 04:29, Matthias J. Sax  wrote:
> 
> > That is correct. It depends on what guarantees you need though. Also
> > note, that producers ofter write into repartitions topics to re-key data
> > and for this case, no ordering guarantee can be provided anyway, as the
> > single writer principle is "violated".
> >
> > Also note, that Kafka Streams can handle out-of-order data for most
> > cases correctly and thus it should be ok to leave the default config
> > values.
> >
> > But as always: it depends on your application and your requirements. As
> > a rule of thumb: as long as you don't experience any issue, I would just
> > go with default configs.
> >
> >
> > -Matthias
> >
> >
> > On 12/2/19 12:02 PM, Murilo Tavares wrote:
> > > Hi everyone
> > > In light of the discussions about order guarantee in Kafka, I am
> > struggling
> > > to understand how that affects KafkaStreams internal *KafkaProducer*.
> > > In the official documentation, this section (
> > >
> > https://docs.confluent.io/current/streams/concepts.html#out-of-order-handling
> > )
> > > enumerates
> > > 2 causes "that could potentially result in out-of-order data *arrivals*
> > > with respect to their timestamps".
> > > But I haven't found anything that mentioned how KafkaStreams *producers*
> > > will handle errors, and how that could lead to out-of-order messages
> > being
> > > produced in output topics.
> > > When I start my KafkaStreams application, I've seen the internal
> > producers
> > > use the below in its default configuration:
> > > enable.idempotence = false
> > > max.in.flight.requests.per.connection = 5
> > > retries = 2147483647
> > >
> > > So I guess that this could mean that at the end of my topology,
> > > KafkaStreams could potentially send out of order messages to an output
> > > topic if for some reason the message fails to be delivered to the broker,
> > > as the internal producer would retry that.
> > >
> > > I've read that to guarantee order in the producers, one needs to set
> > > "max.in.flight.requests.per.connection=1". But I wonder if one should
> > > override this configuration for KafkaStreams applications?
> > >
> > > Thanks
> > > Murilo
> > >
> >
> >
>


Re: Ordering of messages in the same kafka streams sub-topology with multiple sinks for the same topic

2019-12-03 Thread John Roesler
Hi Vasily,

Probably in this case, with the constraints you’re providing, the first branch 
would output first, but I wouldn’t depend on it. Any small change in your 
program could mess this up, and also any change in Streams could alter the 
exact execution order also. 

The right way to think about these programs is as “data flows”. You’re taking a 
stream of data and defining two separate branches into smaller streams, and 
then later on merging those back into one stream. In general, there would be no 
defined ordering, just like if you imagine doing the same thing with literal 
water streams. 

If you want a guarantee about the relative ordering, You’d have to use a 
specific operator that does what you want. If nothing else comes to mind, then 
a custom transformer or processor that gets records from both branches, and 
buffers records from the second so that it can emit the record from the first 
branch first would do the trick. 

Thanks,
John

On Tue, Dec 3, 2019, at 06:32, Vasily Sulatskov wrote:
> Hello,
> 
> I wonder if ordering of the messages is preserved by kafka streams when 
> the messages are processes by the same sub-topology without 
> redistribution and in the end there are multiple sinks for the same 
> topic. 
> 
> I couldn't find the answer to this question in the docs/mailing 
> list/stack overflow.
> 
> You can arrive to this situation with the code like this:
>  
> val source = builder.stream[Key, Value]("input")
> source
>   .filter(...)
>   .mapValues(...)
>   .transform(...)
>   .to("output")
> 
> source
>   .filter(...)
>   .mapValues(...)
>   .transform(...)
>   .to("output")
> 
> Basically it's two different processing branches, that process each 
> input value slightly differently. I.e. if one branch produces a 
> message, in response to an input message, the other branch will produce 
> the message as well. So keeping the ordering in this case means, all 
> messages produces for earlier source messages on one branch should 
> precede messages produced by the other branch for later source messages.
> 
> Here's my topology:
> 
>   Sub-topology: 2
> Source: KSTREAM-SOURCE-19 (topics: [input])
>   --> KSTREAM-MAPVALUES-20
> Processor: KSTREAM-MAPVALUES-20 (stores: [])
>   --> KSTREAM-MAPVALUES-22, KSTREAM-TRANSFORM-21
>   <-- KSTREAM-SOURCE-19
> Processor: KSTREAM-MAPVALUES-22 (stores: [])
>   --> KSTREAM-TRANSFORM-23
>   <-- KSTREAM-MAPVALUES-20
> Processor: KSTREAM-TRANSFORM-21 (stores: [store1])
>   --> KSTREAM-MAP-27
>   <-- KSTREAM-MAPVALUES-20
> Processor: KSTREAM-TRANSFORM-23 (stores: [store2])
>   --> KSTREAM-MAP-24
>   <-- KSTREAM-MAPVALUES-22
> Processor: KSTREAM-MAP-24 (stores: [])
>   --> KSTREAM-FILTER-25
>   <-- KSTREAM-TRANSFORM-23
> Processor: KSTREAM-MAP-27 (stores: [])
>   --> KSTREAM-FILTER-28
>   <-- KSTREAM-TRANSFORM-21
> Processor: KSTREAM-FILTER-25 (stores: [])
>   --> KSTREAM-SINK-26
>   <-- KSTREAM-MAP-24
> Processor: KSTREAM-FILTER-28 (stores: [])
>   --> KSTREAM-SINK-29
>   <-- KSTREAM-MAP-27
> Sink: KSTREAM-SINK-26 (topic: output)
>   <-- KSTREAM-FILTER-25
> Sink: KSTREAM-SINK-29 (topic: output)
>   <-- KSTREAM-FILTER-28
> 
> On one hand I guess that it all information coming from one partition 
> will be processed by one thread, so it can keep the order of the 
> messages, but on the other hand I see two independent sinks in the 
> topology, with independent buffers etc I guess. So in the end I am not 
> sure what's going to happen.
> 
> I would guess that it can work because sinks probably have the same 
> buffer size, but it's not guaranteed. I can imagine a following failure 
> scenario: a write by one sink can succeed while the write by the other 
> sink fails, so a batch of messages gets delivered to the output 
> partition out of order. 
> 
> Can someone please clarify what happens in this case? Is there an 
> ordering guarantee? Can this streams be merged while preserving 
> ordering?
> 
> I know that regular Source.merge() doesn't preserve ordering, but in 
> this case I know that there's no repartitioning etc, and messages 
> basically appear on the same "tick", so it feels like there should be a 
> way to do this. Can I keep ordering if I replace my transformers with 
> processors and manually connect them to the same sink?
> 
> 
> --
> Best regards,
> Vasily Sulatskov
> 
>


Re: Reducing streams startup bandwidth usage

2019-12-03 Thread John Roesler
ing position is 712713
> Restored from sensors-stream-aggregate-store-changelog-1 to aggregate-store
> with 18 records, ending offset is 3024261, next starting position is 3024262
> 
> 
> why it first says it didn't find the checkpoint and then it does find it?
> It seems it loaded about  2.7M records (sum of offset difference in the
> "restorting partition " messages) right?
> Maybe should I try to reduce the checkpoint interval?
> 
> Regards
> 
> --
> Alessandro Tagliapietra
> 
> 
> On Mon, Dec 2, 2019 at 9:18 AM John Roesler  wrote:
> 
> > Hi Alessandro,
> >
> > I'm sorry to hear that.
> >
> > The restore process only takes one factor into account: the current offset
> > position of the changelog topic is stored in a local file alongside the
> > state stores. On startup, the app checks if the recorded position lags the
> > latest offset in the changelog. If so, then it reads the missing changelog
> > records before starting processing.
> >
> > Thus, it would not restore any old window data.
> >
> > There might be a few different things going on to explain your observation:
> > * if there is more than one instance in your Streams cluster, maybe the
> > task is "flopping" between instances, so each instance still has to recover
> > state, since it wasn't the last one actively processing it.
> > * if the application isn't stopped gracefully, it might not get a chance
> > to record its offset in that local file, so on restart it has to restore
> > some or all of the state store from changelog.
> >
> > Or it could be something else; that's just what comes to mind.
> >
> > If you want to get to the bottom of it, you can take a look at the logs,
> > paying close attention to which tasks are assigned to which instances after
> > each restart. You can also look into the logs from
> > `org.apache.kafka.streams.processor.internals.StoreChangelogReader` (might
> > want to set it to DEBUG or TRACE level to really see what's happening).
> >
> > I hope this helps!
> > -John
> >
> > On Sun, Dec 1, 2019, at 21:25, Alessandro Tagliapietra wrote:
> > > Hello everyone,
> > >
> > > we're having a problem with bandwidth usage on streams application
> > startup,
> > > our current setup does this:
> > >
> > > ...
> > > .groupByKey()
> > > .windowedBy(TimeWindows.of(Duration.ofMinutes(1)))
> > > .aggregate(
> > > { MetricSequenceList(ArrayList()) },
> > > { key, value, aggregate ->
> > > aggregate.getRecords().add(value)
> > > aggregate
> > > },
> > > Materialized.`as` > >
> > ByteArray>>("aggregate-store").withKeySerde(Serdes.String()).withValueSerde(Settings.getValueSpecificavroSerde())
> > > )
> > > .toStream()
> > > .flatTransform(TransformerSupplier {
> > > ...
> > >
> > > basically in each window we append the new values and then do some other
> > > logic with the array of windowed values.
> > > The aggregate-store changelog topic configuration  uses compact,delete as
> > > cleanup policy and has 12 hours of retention.
> > >
> > > What we've seen is that on application startup it takes a couple minutes
> > to
> > > rebuild the state store, even if the state store directory is persisted
> > > across restarts. That along with an exception that caused the docker
> > > container to be restarted a couple hundreds times caused a big confluent
> > > cloud bill compared to what we usually spend (1/4 of a full month in 1
> > day).
> > >
> > > What I think is happening is that the topic is keeping all the previous
> > > windows even with the compacting policy because each key is the original
> > > key + the timestamp not just the key. Since we don't care about previous
> > > windows as the flatTransform after the toStream() makes sure that we
> > don't
> > > process old windows (a custom suppressor basically) is there a way to
> > only
> > > keep the last window so that the store rebuilding goes faster and without
> > > rebuilding old windows too? Or should I create a custom window using the
> > > original key as key so that the compaction keeps only the last window
> > data?
> > >
> > > Thank you
> > >
> > > --
> > > Alessandro Tagliapietra
> > >
> >
>


Re: Reducing streams startup bandwidth usage

2019-12-03 Thread John Roesler
Hey Alessandro,

That sounds also like it would work. I'm wondering if it would actually change 
what you observe w.r.t. recovery behavior, though. Streams already sets the 
retention time on the changelog to equal the retention time of the windows, for 
windowed aggregations, so you shouldn't be loading a lot of window data for old 
windows you no longer care about.

Have you set the "grace period" on your window definition? By default, it is 
set to 24 hours, but you can set it as low as you like. E.g., if you want to 
commit to having in-order data only, then you can set the grace period to zero. 
This _should_ let the broker clean up the changelog records as soon as the 
window ends. 

Of course, the log cleaner doesn't run all the time, so there's some extra 
delay in which "expired" data would still be visible in the changelog, but it 
would actually be just the same as if you manage the store yourself.

Hope this helps!
-John

On Tue, Dec 3, 2019, at 22:22, Alessandro Tagliapietra wrote:
> Thanks John for the explanation,
> 
> I thought that with EOS enabled (which we have) it would in the worst case
> find a valid checkpoint and start the restore from there until it reached
> the last committed status, not completely from scratch. What you say
> definitely makes sense now.
> Since we don't really need old time windows and we ensure data is ordered
> when processed I think I"ll just write a custom transformer to keep only
> the last window, store intermediate aggregation results in the store and
> emit a new value only when we receive data belonging to a new window.
> That with a compact only changelog topic should keep the rebuild data to
> the minimum as it would have only the last value for each key.
> 
> Hope that makes sense
> 
> Thanks again
> 
> --
> Alessandro Tagliapietra
> 
> 
> On Tue, Dec 3, 2019 at 3:04 PM John Roesler  wrote:
> 
> > Hi Alessandro,
> >
> > To take a stab at your question, maybe it first doesn't find it, but then
> > restores some data, writes the checkpoint, and then later on, it has to
> > re-initialize the task for some reason, and that's why it does find a
> > checkpoint then?
> >
> > More to the heart of the issue, if you have EOS enabled, Streams _only_
> > records the checkpoint when the store is in a known-consistent state. For
> > example, if you have a graceful shutdown, Streams will flush all the
> > stores, commit all the transactions, and then write the checkpoint file.
> > Then, on re-start, it will pick up from that checkpoint.
> >
> > But as soon as it starts processing records, it removes the checkpoint
> > file, so if it crashes while it was processing, there is no checkpoint file
> > there, and it will have to restore from the beginning of the changelog.
> >
> > This design is there on purpose, because otherwise we cannot actually
> > guarantee correctness... For example, if you are maintaining a count
> > operation, and we process an input record i, increment the count and write
> > it to the state store, and to the changelog topic. But we crash before we
> > commit that transaction. Then, the write to the changelog would be aborted,
> > and we would re-process record i . However, we've already updated the local
> > state store, so when we increment it again, it results in double-counting
> > i. The key point here is that there's no way to do an atomic operation
> > across two different systems (local state store and the changelog topic).
> > Since we can't guarantee that we roll back the incremented count when the
> > changelog transaction is aborted, we can't keep the local store consistent
> > with the changelog.
> >
> > After a crash, the only way to ensure the local store is consistent with
> > the changelog is to discard the entire thing and rebuild it. This is why we
> > have an invariant that the checkpoint file only exists when we _know_ that
> > the local store is consistent with the changelog, and this is why you're
> > seeing so much bandwidth when re-starting from an unclean shutdown.
> >
> > Note that it's definitely possible to do better than this, and we would
> > very much like to improve it in the future.
> >
> > Thanks,
> > -John
> >
> > On Tue, Dec 3, 2019, at 16:16, Alessandro Tagliapietra wrote:
> > > Hi John,
> > >
> > > thanks a lot for helping, regarding your message:
> > >  - no we only have 1 instance of the stream application, and it always
> > > re-uses the same state folder
> > >  - yes we're seeing most issues when restarting not gracefully due
>

Re: Reducing streams startup bandwidth usage

2019-12-03 Thread John Roesler
Oh, yeah, I remember that conversation!

Yes, then, I agree, if you're only storing state of the most recent window for 
each key, and the key you use for that state is actually the key of the 
records, then an aggressive compaction policy plus your custom transformer 
seems like a good way forward.

What I was referring to is that, in Streams, the keys for window aggregation 
state is actually composed of both the window itself and the key. In the DSL, 
it looks like "Windowed". That results in the store having a unique key per 
window for each K, which is why we need retention as well as compaction for our 
changelogs. But for you, if you just make the key "K", then compaction alone 
should do the trick.

And yes, if you manage the topic yourself, then Streams won't adjust the 
retention time. I think it might validate that the retention isn't too short, 
but I don't remember offhand.

Cheers, and let me know how it goes!
-John

On Tue, Dec 3, 2019, at 23:03, Alessandro Tagliapietra wrote:
> Hi John,
> 
> afaik grace period uses stream time
> https://kafka.apache.org/21/javadoc/org/apache/kafka/streams/kstream/Windows.html
> which is
> per partition, unfortunately we process data that's not in sync between
> keys so each key needs to be independent and a key can have much older 
> data
> than the other.
> 
> Having a small grace period would probably close old windows sooner than
> expected. That's also why in my use case a custom store that just stores
> the last window data for each key might work better. I had the same issue
> with suppression and it has been reported here
> https://issues.apache.org/jira/browse/KAFKA-8769
> Oh I just saw that you're the one that helped me on slack and created the
> issue (thanks again for that).
> 
> The behavior that you mention about streams setting changelog retention
> time is something they do on creation of the topic when the broker has auto
> creation enabled? Because we're using confluent cloud and I had to create
> it manually.
> Regarding the change in the recovery behavior, with compact cleanup policy
> shouldn't the changelog only keep the last value? That would make the
> recovery faster and "cheaper" as it would only need to read a single value
> per key (if the cleanup just happened) right?
> 
> --
> Alessandro Tagliapietra
> 
> 
> On Tue, Dec 3, 2019 at 8:51 PM John Roesler  wrote:
> 
> > Hey Alessandro,
> >
> > That sounds also like it would work. I'm wondering if it would actually
> > change what you observe w.r.t. recovery behavior, though. Streams already
> > sets the retention time on the changelog to equal the retention time of the
> > windows, for windowed aggregations, so you shouldn't be loading a lot of
> > window data for old windows you no longer care about.
> >
> > Have you set the "grace period" on your window definition? By default, it
> > is set to 24 hours, but you can set it as low as you like. E.g., if you
> > want to commit to having in-order data only, then you can set the grace
> > period to zero. This _should_ let the broker clean up the changelog records
> > as soon as the window ends.
> >
> > Of course, the log cleaner doesn't run all the time, so there's some extra
> > delay in which "expired" data would still be visible in the changelog, but
> > it would actually be just the same as if you manage the store yourself.
> >
> > Hope this helps!
> > -John
> >
> > On Tue, Dec 3, 2019, at 22:22, Alessandro Tagliapietra wrote:
> > > Thanks John for the explanation,
> > >
> > > I thought that with EOS enabled (which we have) it would in the worst
> > case
> > > find a valid checkpoint and start the restore from there until it reached
> > > the last committed status, not completely from scratch. What you say
> > > definitely makes sense now.
> > > Since we don't really need old time windows and we ensure data is ordered
> > > when processed I think I"ll just write a custom transformer to keep only
> > > the last window, store intermediate aggregation results in the store and
> > > emit a new value only when we receive data belonging to a new window.
> > > That with a compact only changelog topic should keep the rebuild data to
> > > the minimum as it would have only the last value for each key.
> > >
> > > Hope that makes sense
> > >
> > > Thanks again
> > >
> > > --
> > > Alessandro Tagliapietra
> > >
> > >
> > > On Tue, Dec 3, 2019 at 3:04 PM John Roesler  wrote:
> > >
>

Re: Reducing streams startup bandwidth usage

2019-12-04 Thread John Roesler
Oh, good!

On Tue, Dec 3, 2019, at 23:29, Alessandro Tagliapietra wrote:
> Testing on staging shows that a restart on exception is much faster and the
> stream starts right away which I think means we're reading way less data
> than before!
> 
> What I was referring to is that, in Streams, the keys for window
> > aggregation state is actually composed of both the window itself and the
> > key. In the DSL, it looks like "Windowed". That results in the store
> > having a unique key per window for each K, which is why we need retention
> > as well as compaction for our changelogs. But for you, if you just make the
> > key "K", then compaction alone should do the trick.
> 
> Yes we had compact,delete as cleanup policy but probably it still had a too
> long retention value, also the rocksdb store is probably much faster now
> having only one key per key instead of one key per window per key.
> 
> Thanks a lot for helping! I'm now going to setup a prometheus-jmx
> monitoring so we can keep better track of what's going on :)
> 
> --
> Alessandro Tagliapietra
> 
> 
> On Tue, Dec 3, 2019 at 9:12 PM John Roesler  wrote:
> 
> > Oh, yeah, I remember that conversation!
> >
> > Yes, then, I agree, if you're only storing state of the most recent window
> > for each key, and the key you use for that state is actually the key of the
> > records, then an aggressive compaction policy plus your custom transformer
> > seems like a good way forward.
> >
> > What I was referring to is that, in Streams, the keys for window
> > aggregation state is actually composed of both the window itself and the
> > key. In the DSL, it looks like "Windowed". That results in the store
> > having a unique key per window for each K, which is why we need retention
> > as well as compaction for our changelogs. But for you, if you just make the
> > key "K", then compaction alone should do the trick.
> >
> > And yes, if you manage the topic yourself, then Streams won't adjust the
> > retention time. I think it might validate that the retention isn't too
> > short, but I don't remember offhand.
> >
> > Cheers, and let me know how it goes!
> > -John
> >
> > On Tue, Dec 3, 2019, at 23:03, Alessandro Tagliapietra wrote:
> > > Hi John,
> > >
> > > afaik grace period uses stream time
> > >
> > https://kafka.apache.org/21/javadoc/org/apache/kafka/streams/kstream/Windows.html
> > > which is
> > > per partition, unfortunately we process data that's not in sync between
> > > keys so each key needs to be independent and a key can have much older
> > > data
> > > than the other.
> > >
> > > Having a small grace period would probably close old windows sooner than
> > > expected. That's also why in my use case a custom store that just stores
> > > the last window data for each key might work better. I had the same issue
> > > with suppression and it has been reported here
> > > https://issues.apache.org/jira/browse/KAFKA-8769
> > > Oh I just saw that you're the one that helped me on slack and created the
> > > issue (thanks again for that).
> > >
> > > The behavior that you mention about streams setting changelog retention
> > > time is something they do on creation of the topic when the broker has
> > auto
> > > creation enabled? Because we're using confluent cloud and I had to create
> > > it manually.
> > > Regarding the change in the recovery behavior, with compact cleanup
> > policy
> > > shouldn't the changelog only keep the last value? That would make the
> > > recovery faster and "cheaper" as it would only need to read a single
> > value
> > > per key (if the cleanup just happened) right?
> > >
> > > --
> > > Alessandro Tagliapietra
> > >
> > >
> > > On Tue, Dec 3, 2019 at 8:51 PM John Roesler  wrote:
> > >
> > > > Hey Alessandro,
> > > >
> > > > That sounds also like it would work. I'm wondering if it would actually
> > > > change what you observe w.r.t. recovery behavior, though. Streams
> > already
> > > > sets the retention time on the changelog to equal the retention time
> > of the
> > > > windows, for windowed aggregations, so you shouldn't be loading a lot
> > of
> > > > window data for old windows you no longer care about.
> > > >
> > > > Have you set the "grace period

Re: How to set concrete names for state stores and internal topics backed by these

2019-12-06 Thread John Roesler
Hi Sachin,

The way that Java infers generic arguments makes that case particularly 
obnoxious.

By the way, the problem you're facing is specifically addressed by these 
relatively new features:
* 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-307%3A+Allow+to+define+custom+processor+names+with+KStreams+DSL
* 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-372%3A+Naming+Repartition+Topics+for+Joins+and+Grouping

Since this behavior has been under development recently, I thought you might 
benefit from the context.

To answer your question, what you have to do is explicitly mention the type 
arguments to "Materialized.as(name)" when you're using the withKeySerde, etc. 

It will look something like this:

Materialized
  .>as("store-name")
  .withKeySerde(new Serde...)
  .withValueSerde(new Serde...));

I can explain exactly why this is necessary if you want, but the short answer 
is that the Java type system only makes a rudimentary effort to infer types.

FWIW, this "paper cut" makes me irrationally angry, and I'm hoping we can find 
a way to fix it, if we ever change the Materialized builder interface.

Hope this helps,
-John

On Fri, Dec 6, 2019, at 11:15, Sachin Mittal wrote:
> Hi,
> In my application I have names of internal topics like this:
> 
> ss-session-application-KSTREAM-JOINOTHER-59-store-changelog-0
> ss-session-application-KSTREAM-JOINTHIS-49-store-changelog-0
> ss-session-application-KSTREAM-OUTEROTHER-50-store-changelog-0
> ss-session-application-KTABLE-MERGE-STATE-STORE-61-changelog-0
> 
> Is it possible to set concrete names for these instead of say **
> KSTREAM-JOINOTHER-59-store**
> 
> This way I can identify at what code in my DSL is responsible for data
> inside them.
> 
> So far I have set names for:
> Grouped.with
> Materialized.as
> Joined.with
> 
> This has helped me get concrete names at many places however still at some
> places I see arbitrary names.
> 
> Also note that somehow this code works
> Materialized.with(new JSONSerde(), new TupleJSONSerde>())
> 
> But not:
> Materialized.as("d-l-i-store").withKeySerde(new
> JSONSerde()).withValueSerde(new TupleJSONSerde>())
> 
> The error I get is:
> Description Resource Path Location Type
> The method withKeySerde(Serde) in the type
> Materialized is not applicable for the arguments
> (JSONSerde)
> 
> I have my class
> 
> class JSONSerde implements Serializer,
> Deserializer, Serde {
> ..
> }
> 
> This is pretty much same as from kafka streams typed example.
> 
> Thanks
> Sachin
>


Re: Kafka Streams Topology describe format

2019-12-06 Thread John Roesler
Hi again, Sachin,

I highly recommend this tool for helping to understand the topology 
description: https://zz85.github.io/kafka-streams-viz/

I think your interpretation of the format is pretty much spot-on.

Hope this helps,
-John

On Fri, Dec 6, 2019, at 12:21, Sachin Mittal wrote:
> Hi,
> I am just posting a section of my topology to basically understand what
> describe method actually displays.
> 
> What can we understand just by looking at the topology (like what does -->
> and <-- arrows represent).
> 
> --
> Source: KSTREAM-SOURCE-43 (topics:
> [price-change-left-repartition])
>   --> KSTREAM-WINDOWED-47
> Source: KSTREAM-SOURCE-46 (topics:
> [price-change-right-repartition])
>   --> KSTREAM-WINDOWED-48
> Processor: KSTREAM-WINDOWED-47 (stores:
> [KSTREAM-JOINTHIS-49-store])
>   --> KSTREAM-JOINTHIS-49
>   <-- KSTREAM-SOURCE-43
> Processor: KSTREAM-WINDOWED-48 (stores:
> [KSTREAM-OUTEROTHER-50-store])
>   --> KSTREAM-OUTEROTHER-50
>   <-- KSTREAM-SOURCE-46
> Processor: KSTREAM-JOINTHIS-49 (stores:
> [KSTREAM-OUTEROTHER-50-store])
>   --> KSTREAM-MERGE-51
>   <-- KSTREAM-WINDOWED-47
> Processor: KSTREAM-OUTEROTHER-50 (stores:
> [KSTREAM-JOINTHIS-49-store])
>   --> KSTREAM-MERGE-51
>   <-- KSTREAM-WINDOWED-48
> -
> 
> What I basically understand is that there are 2 sources converted into
> windowed source using two of the processors.
> There are 2 more subsequent processors which take some input and process
> some output.
> Ones marked by --> is the input and one marked by <-- is the ouput.
> And the string next to Processor: or Source: is the name of the processor
> or input of the source.
> 
> Is my understand correct or if I am missing something.
> 
> Thanks
> Sachin
>


Re: What timestamp is used by streams when doing windowed joins

2019-12-06 Thread John Roesler
Hi Sachin,

I'd need more information to speculate about why your records are missing, but 
it sounds like you're suspecting something to do with the records' timestamps, 
so I'll just focus on answering your questions.

Streams always uses the same timestamp for all operations, which is the 
timestamp returned by the timestamp extractor. Whether this is event time or 
ingestion time is up to the timestamp extractor you're using.

If you're using the default timestamp extractor, then Streams will use the 
timestamp field on the ConsumerRecord that comes back from the broker. If 
you're using CreateTime, then it would hold the value of the timestamp written 
by the producer. If you're using LogAppendTime, then it's the timestamp 
representing when the broker actually adds the record to the topic.

One potential point of confusion is that when we say a "record", we mean more 
than just the key and value that you typically manipulate using the Streams 
DSL. In addition to these fields, there is a separate timestamp field, which is 
part of the Consumer/Producer/Broker protocols. That's what we use for time 
tracking, so you do not need to worry about embedding and extracting the 
timestamp in your values.

Streams will set the timestamp field on outgoing ProducerRecords it sends to 
the broker, so this would just be used by default for further stages in the 
pipeline. You don't need to add timestamp extractors further on.

The only usage of processing time (aka "wall-clock time") is in wall-clock 
based punctuation, if you're using the low-level Processor API. Also, the 
commit interval is defined in terms of wall-clock time. If all you're 
considering is the semantics of the Streams DSL, processing/wall-clock time 
would not play any part in those semantics.

I know that stream processing literature in general discusses event- vs. 
processing- vs. ingestion-time quite a bit, but for practical purposes, event 
time (either CreateTime or LogAppendTime) is the one that's useful for writing 
programs. Both ingestion time and processing time lead to non-deterministic 
programs with unclear semantics. That's why we pretty much stick to event time 
in the Streams DSL.

Finally, yeah, if you just want to process records in the same order they 
appear in the topics, then LogAppendTime might be better. 

I hope this helps clear things up a bit.

Thanks,
-John

On Fri, Dec 6, 2019, at 22:20, Sachin Mittal wrote:
> Hi,
> I have noticed some issues when doing stream to stream windowed joins.
> Looks like my joined stream does not include all the records.
> 
> Say I am doing join like this:
> stream1.join(
> stream2,
> (lv, rv) -> ...,
> JoinWindows.of(Duration.ofMinutes(5)),
>)
> What I have checked from the docs is that it will join 2 records within the
> specified window.
> However its not clear as what time it would take for each record?
> Would it be
> 1.  event-time or
> 2. processing-time or
> 3. ingestion-time
> 
> I am right now using default configuration for
> log.message.timestamp.type = CreateTime and default.timestamp.extractor
> 
> From the docs I gather is that in default case it uses event-time.
> So does it mean that there has to be a timestamp field in the record which
> is to be extracted by custom timestamp extractor?
> 
> Also in downstream when streams application actually writes (produces) new
> record types, do we need to provide timestamp extractor for all such record
> types
> so the next process in the pipeline can pick up the timestamp to do the
> windowed operations?
> 
> Also when and how processing time is used at all by streams application?
> 
> Finally say I don't want to worry about if timestamp is set by the
> producers, is it better to simply set
> log.message.timestamp.type =  LogAppendTime
> 
> Thanks
> Sachin
>


Re: Reducing streams startup bandwidth usage

2019-12-07 Thread John Roesler
Ah, yes. Glad you figured it out!

Caching does not reduce EOS guarantees at all. I highly recommend using it. You 
might even want to take a look at the caching metrics to make sure you have a 
good hit ratio. 

-John

On Sat, Dec 7, 2019, at 10:51, Alessandro Tagliapietra wrote:
> Never mind I've found out I can use `.withCachingEnabled` on the store
> builder to achieve the same thing as the windowing example as
> `Materialized.as` turns that on by default.
> 
> Does caching in any way reduces the EOS guarantees?
> 
> --
> Alessandro Tagliapietra
> 
> 
> On Sat, Dec 7, 2019 at 1:12 AM Alessandro Tagliapietra <
> tagliapietra.alessan...@gmail.com> wrote:
> 
> > Seems my journey with this isn't done just yet,
> >
> > This seems very complicated to me but I'll try to explain it as best I can.
> > To better understand the streams network usage I've used prometheus with
> > the JMX exporter to export kafka metrics.
> > To check the amount of data we use I'm looking at the increments
> > of kafka_producer_topic_metrics_byte_total and
> > kafka_producer_producer_topic_metrics_record_send_total,
> >
> > Our current (before the change mentioned above) code looks like this:
> >
> > // This transformers just pairs a value with the previous one storing the
> > temporary one in a store
> > val pairsStream = metricStream
> >   .transformValues(ValueTransformerWithKeySupplier { PairTransformer() },
> > "LastValueStore")
> >   .filter { _, value: MetricSequence? -> value != null }
> >
> > // Create a store to store suppressed windows until a new one is received
> > val suppressStoreSupplier =
> > Stores.keyValueStoreBuilder(Stores.persistentKeyValueStore("suppress-store"),
> > ..
> >
> > // Window and aggregate data in 1 minute intervals
> > val aggregatedStream = pairsStream
> >   .groupByKey()
> >   .windowedBy(TimeWindows.of(Duration.ofMinutes(1)))
> >   .aggregate(
> >   { MetricSequenceList(ArrayList()) },
> >   { key, value, aggregate ->
> >   aggregate.getRecords().add(value)
> >   aggregate
> >   },
> >   Materialized.`as` > ByteArray>>("aggregate-store").withKeySerde(Serdes.String()).withValueSerde(Settings.getValueSpecificavroSerde())
> >   )
> >   .toStream()
> >   .flatTransform(TransformerSupplier {
> >   // This transformer basically waits until a new window is received
> > to emit the previous one
> >   }, "suppress-store")
> >   .map { sensorId: String, suppressedOutput: SuppressedOutput ->
> >    etc 
> >
> >
> > Basically:
> >  - all data goes through LastValueStore store that stores each message and
> > emits a pair with the previous one
> >  - the aggregate-store is used to store the per-window list of messages in
> > the aggregate method
> >  - the suppress store is used to store each received window which is
> > emitted only after a newer one is received
> >
> > What I'm experiencing is that:
> >  - during normal execution, the streams app sends to the lastvalue store
> > changelog topic 5k messages/min, the aggregate and suppress store changelog
> > topics only about 100
> >  - at some point (after many hours of operation), the streams app starts
> > sending to the aggregate and suppress store changelog topic the same amount
> > of messages going through the lastvaluestore
> >  - if I restart the streams app it goes back to the initial behavior
> >
> > You can see the behavior in this graph https://imgur.com/dJcUNSf
> > You can also see that after a restart everything goes back to normal
> > levels.
> > Regarding other metrics, process latency increases, poll latency
> > decreases, poll rate decreases, commit rate stays the same while commit
> > latency increases.
> >
> > Now, I've these questions:
> >  - why isn't the aggregate/suppress store changelog topic throughput the
> > same as the LastValueStore? Shouldn't every time it aggregates send a
> > record to the changelog?
> >  - is the windowing doing some internal caching like not sending every
> > aggregation record until the window time is passed? (if so, where can I
> > find that code since I would like to use that also for our new
> > implementation)
> >
> > Thank you in advance
> >
> > --
> > Alessandro Tagliapietra
> >
> >
> > On Wed, Dec 4, 2019 at 7:57 AM John Roesler  wrote:
> >
> >> Oh, good!
> 

Re: Reducing streams startup bandwidth usage

2019-12-07 Thread John Roesler
Hmm, that’s a good question. Now that we’re talking about caching, I wonder if 
the cache was just too small. It’s not very big by default. 

On Sat, Dec 7, 2019, at 11:16, Alessandro Tagliapietra wrote:
> Ok I'll check on that!
> 
> Now I can see that with caching we went from 3-4MB/s to 400KB/s, that will
> help with the bill.
> 
> Last question, any reason why after a while the regular windowed stream
> starts sending every update instead of caching?
> Could it be because it doesn't have any more memory available? Any other
> possible reason?
> 
> Thank you so much for your help
> 
> --
> Alessandro Tagliapietra
> 
> 
> On Sat, Dec 7, 2019 at 9:14 AM John Roesler  wrote:
> 
> > Ah, yes. Glad you figured it out!
> >
> > Caching does not reduce EOS guarantees at all. I highly recommend using
> > it. You might even want to take a look at the caching metrics to make sure
> > you have a good hit ratio.
> >
> > -John
> >
> > On Sat, Dec 7, 2019, at 10:51, Alessandro Tagliapietra wrote:
> > > Never mind I've found out I can use `.withCachingEnabled` on the store
> > > builder to achieve the same thing as the windowing example as
> > > `Materialized.as` turns that on by default.
> > >
> > > Does caching in any way reduces the EOS guarantees?
> > >
> > > --
> > > Alessandro Tagliapietra
> > >
> > >
> > > On Sat, Dec 7, 2019 at 1:12 AM Alessandro Tagliapietra <
> > > tagliapietra.alessan...@gmail.com> wrote:
> > >
> > > > Seems my journey with this isn't done just yet,
> > > >
> > > > This seems very complicated to me but I'll try to explain it as best I
> > can.
> > > > To better understand the streams network usage I've used prometheus
> > with
> > > > the JMX exporter to export kafka metrics.
> > > > To check the amount of data we use I'm looking at the increments
> > > > of kafka_producer_topic_metrics_byte_total and
> > > > kafka_producer_producer_topic_metrics_record_send_total,
> > > >
> > > > Our current (before the change mentioned above) code looks like this:
> > > >
> > > > // This transformers just pairs a value with the previous one storing
> > the
> > > > temporary one in a store
> > > > val pairsStream = metricStream
> > > >   .transformValues(ValueTransformerWithKeySupplier { PairTransformer()
> > },
> > > > "LastValueStore")
> > > >   .filter { _, value: MetricSequence? -> value != null }
> > > >
> > > > // Create a store to store suppressed windows until a new one is
> > received
> > > > val suppressStoreSupplier =
> > > >
> > Stores.keyValueStoreBuilder(Stores.persistentKeyValueStore("suppress-store"),
> > > > ..
> > > >
> > > > // Window and aggregate data in 1 minute intervals
> > > > val aggregatedStream = pairsStream
> > > >   .groupByKey()
> > > >   .windowedBy(TimeWindows.of(Duration.ofMinutes(1)))
> > > >   .aggregate(
> > > >   { MetricSequenceList(ArrayList()) },
> > > >   { key, value, aggregate ->
> > > >   aggregate.getRecords().add(value)
> > > >   aggregate
> > > >   },
> > > >   Materialized.`as` > WindowStore > > >
> > ByteArray>>("aggregate-store").withKeySerde(Serdes.String()).withValueSerde(Settings.getValueSpecificavroSerde())
> > > >   )
> > > >   .toStream()
> > > >   .flatTransform(TransformerSupplier {
> > > >   // This transformer basically waits until a new window is
> > received
> > > > to emit the previous one
> > > >   }, "suppress-store")
> > > >   .map { sensorId: String, suppressedOutput: SuppressedOutput ->
> > > >    etc 
> > > >
> > > >
> > > > Basically:
> > > >  - all data goes through LastValueStore store that stores each message
> > and
> > > > emits a pair with the previous one
> > > >  - the aggregate-store is used to store the per-window list of
> > messages in
> > > > the aggregate method
> > > >  - the suppress store is used to store each received window which is
> > > > emitted only after a newer one is received
> > > >
> > > > What I'm experiencing is that:

Re: How to set concrete names for state stores and internal topics backed by these

2019-12-07 Thread John Roesler
Hi Sachin,

I’m glad it helped!

What you have in mind is a good thing to do.

One thing to watch out for is _not_ to add names using Materialized for KTable 
operations that otherwise would not create a store. For example, if you filter 
or mapValues a KTable, those operations usually do not actually require storing 
any state. But if you add a name with Materialized, you’re telling Streams to 
actually create a state store and materialize the table. What I would do is 
write the topology without names first, then use topology.describe() to figure 
out which actual stores are needed, and then name them. This is an area I have 
some plans to improve. 

To answer your question, if you didn’t need serdes before, you don’t need them 
when you just use Materialized.as(name). Streams should continue to pass down 
the serdes through the program where it can. 

Hope this answers your question. 
-John

On Sat, Dec 7, 2019, at 03:20, Sachin Mittal wrote:
> Hi John,
> This was very helpful. However I am still confused about when to set the
> names for Materialized and Grouped.
> I am basically setting the names because to have definite names of state
> stores and internal topics identifiable for debugging purpose.
> 
> So when we set a name, do we also need to set serde for key/value type?
> If not then what defaults are used by them?
> 
> I'll just explain by quick example:
> My original code was:
> table = stream.map((k, v) -> ...).groupByKey().reduce((av, nv) -> nv)
> 
> In order to set some names to the intermediate stores/topics I changed the
> code as:
> table = stream.map((k, v) -> ...).groupByKey(Grouped.with("group",
> Serde, Serde)).reduce((av, nv) -> nv, Materialized.as("store"))
> 
> So I wanted to know once I create a named Materialzed do I need to set its
> key/value serde too?
> so is this the better code
> table = stream
>   .map((k, v) -> ...)
>   .groupByKey(Grouped.with("group", Serde, Serde))
>   .reduce((av, nv) -> nv, Materialized. byte[]>>as("store-name").withKeySerde(Serde).withValueSerde(Serde)))
> 
> Note that I have custom class for Key and Value.
> 
> Thanks
> Sachin
> 
> 
> 
> On Fri, Dec 6, 2019 at 11:02 PM John Roesler  wrote:
> 
> > Hi Sachin,
> >
> > The way that Java infers generic arguments makes that case particularly
> > obnoxious.
> >
> > By the way, the problem you're facing is specifically addressed by these
> > relatively new features:
> > *
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-307%3A+Allow+to+define+custom+processor+names+with+KStreams+DSL
> > *
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-372%3A+Naming+Repartition+Topics+for+Joins+and+Grouping
> >
> > Since this behavior has been under development recently, I thought you
> > might benefit from the context.
> >
> > To answer your question, what you have to do is explicitly mention the
> > type arguments to "Materialized.as(name)" when you're using the
> > withKeySerde, etc.
> >
> > It will look something like this:
> >
> > Materialized
> >   .>as("store-name")
> >   .withKeySerde(new Serde...)
> >   .withValueSerde(new Serde...));
> >
> > I can explain exactly why this is necessary if you want, but the short
> > answer is that the Java type system only makes a rudimentary effort to
> > infer types.
> >
> > FWIW, this "paper cut" makes me irrationally angry, and I'm hoping we can
> > find a way to fix it, if we ever change the Materialized builder interface.
> >
> > Hope this helps,
> > -John
> >
> > On Fri, Dec 6, 2019, at 11:15, Sachin Mittal wrote:
> > > Hi,
> > > In my application I have names of internal topics like this:
> > >
> > > ss-session-application-KSTREAM-JOINOTHER-59-store-changelog-0
> > > ss-session-application-KSTREAM-JOINTHIS-49-store-changelog-0
> > > ss-session-application-KSTREAM-OUTEROTHER-50-store-changelog-0
> > > ss-session-application-KTABLE-MERGE-STATE-STORE-61-changelog-0
> > >
> > > Is it possible to set concrete names for these instead of say **
> > > KSTREAM-JOINOTHER-59-store**
> > >
> > > This way I can identify at what code in my DSL is responsible for data
> > > inside them.
> > >
> > > So far I have set names for:
> > > Grouped.with
> > > Materialized.as
> > > Joined.with
> > >
> > > This has helped me get concrete names at many places however still at
> > some
> > > places I see arbitra

Re: What timestamp is used by streams when doing windowed joins

2019-12-08 Thread John Roesler
Ah,

I didn’t remember that the docs defined the terms that way. Those definitions 
make sense to me. 

Yes, if your topics are configured with LogAppendTime, then when we poll 
records from the topic, the timestamp that comes back attached to the record 
would be the log append time. If you’re using the default TimestampExtractor, 
then that’s the timestamp that Streams would use for the record.

And, yes, your description of JoinWindows seems correct. 

-John

On Sat, Dec 7, 2019, at 01:19, Sachin Mittal wrote:
> Hi John,
> If I check https://docs.confluent.io/current/streams/concepts.html#time
> It has three notions of time => *event-time*, *processing-time*,
> *ingestion-time* .
> 
> If I check
> https://docs.confluent.io/current/streams/developer-guide/config-streams.html#streams-developer-guide-timestamp-extractor
> It says that under default case:
> if log.message.timestamp.type is set to CreateTime then * event-time* is
> used.
> if log.message.timestamp.type is set to LogAppendTime then 
> *ingestion-time *is
> used.
> 
> However what you are saying is that under Steams DSL we always use
> event-time which can either be CreateTime or LogAppendTime.
> 
> Both of the statements makes sense to me but looks like they are sightly
> different on how they relate the times.
> One basically says
> * event-time* <=> CreateTime
> *ingestion-time *<=>  LogAppendTime
> Where as other
> event-time =  CreateTime or  LogAppendTime (depending on your broker/topic
> config).
> 
> Yes setting  log.message.timestamp.type to LogAppendTime seems to be doing
> window join better.
> 
> So what I understand than from JoinWindows.of(Duration.ofMinutes(5))is that
> when joining the two records it checks if their LogAppendTime are within 5
> minutes then they would get joined.
> Please let me know if I got this part right?
> 
> Thanks
> Sachin
> 
> 
> 
> 
> On Sat, Dec 7, 2019 at 10:43 AM John Roesler  wrote:
> 
> > Hi Sachin,
> >
> > I'd need more information to speculate about why your records are missing,
> > but it sounds like you're suspecting something to do with the records'
> > timestamps, so I'll just focus on answering your questions.
> >
> > Streams always uses the same timestamp for all operations, which is the
> > timestamp returned by the timestamp extractor. Whether this is event time
> > or ingestion time is up to the timestamp extractor you're using.
> >
> > If you're using the default timestamp extractor, then Streams will use the
> > timestamp field on the ConsumerRecord that comes back from the broker. If
> > you're using CreateTime, then it would hold the value of the timestamp
> > written by the producer. If you're using LogAppendTime, then it's the
> > timestamp representing when the broker actually adds the record to the
> > topic.
> >
> > One potential point of confusion is that when we say a "record", we mean
> > more than just the key and value that you typically manipulate using the
> > Streams DSL. In addition to these fields, there is a separate timestamp
> > field, which is part of the Consumer/Producer/Broker protocols. That's what
> > we use for time tracking, so you do not need to worry about embedding and
> > extracting the timestamp in your values.
> >
> > Streams will set the timestamp field on outgoing ProducerRecords it sends
> > to the broker, so this would just be used by default for further stages in
> > the pipeline. You don't need to add timestamp extractors further on.
> >
> > The only usage of processing time (aka "wall-clock time") is in wall-clock
> > based punctuation, if you're using the low-level Processor API. Also, the
> > commit interval is defined in terms of wall-clock time. If all you're
> > considering is the semantics of the Streams DSL, processing/wall-clock time
> > would not play any part in those semantics.
> >
> > I know that stream processing literature in general discusses event- vs.
> > processing- vs. ingestion-time quite a bit, but for practical purposes,
> > event time (either CreateTime or LogAppendTime) is the one that's useful
> > for writing programs. Both ingestion time and processing time lead to
> > non-deterministic programs with unclear semantics. That's why we pretty
> > much stick to event time in the Streams DSL.
> >
> > Finally, yeah, if you just want to process records in the same order they
> > appear in the topics, then LogAppendTime might be better.
> >
> > I hope this helps clear things up a bit.
> >
> > Thanks,
> > -Jo

Re: Kafka trunk vs master branch

2019-12-25 Thread John Roesler
Hi Sachin,

Trunk is the basis for development. I’m not sure what master is for, if 
anything. I’ve never used it for anything or even checked it out. 

The numbered release branches are used to develop patch releases.

Releases are created from trunk, PRs should be made against trunk, etc. 

Thanks for asking!
John

On Wed, Dec 25, 2019, at 08:54, Sachin Mittal wrote:
> Hello Folks,
> I just wanted to know what commits goes into what branch.
> 
> I see trunk branch which seems default and latest.
> I also see master branch which seems bit behind trunk.
> I also see different versions branches like 2.2, 2.3 and 2.4 which are also
> actively updated.
> 
> I wanted to know when forking kafka repo, which is the branch one should
> use base off to build from source or do any active development.
> 
> What is the difference between between trunk and master branch?
> Also release branches are created from trunk or master branch?
> 
> Also when issuing a pull request which is the general branch one should use
> as target?
> 
> Thanks
> Sachin
>


Re: designing a streaming task for count and event time difference

2020-01-05 Thread John Roesler
Hey Chris,

Yeah, I think what you’re really looking for is data-driven windowing, which we 
haven’t implemented yet. In lieu of that, you’ll want to build on top of 
session windows. 

What you can do is define an aggregate object similar to what Sachin proposed. 
After the aggregation, you can just filter to only allow results where “open == 
false”. Since you have explicit session end events, I don’t think you need 
suppression. 

Hope this helps,
John 

On Sun, Jan 5, 2020, at 06:36, Chris Madge wrote:
> This is great, thank you very much.  I've read more into session 
> windowing and suppression and they seem to fit my needs perfectly.  I'm 
> struggling to find a method of triggering the window to close early 
> when I receive the end event.
> 
> Maybe I could assign a monotonically increasing identifier each time I 
> see a start event, then re-key including that as part of a compound key 
> and session window by that?  I feel like I may be engineering an 
> anti-pattern where there's something much better already built in.
> 
> On 2020/01/04 17:46:40, Sachin Mittal  wrote: 
> > Try something like this:
> > 
> > stream
> >   .groupBy(
> > (key, value) -> value.userId
> >   )
> >   .aggregate(
> > () -> new Session(),
> > (aggKey, newValue, aggValue) -> {
> >   aggValue.userId = newValue.userId
> >   if (newValue.start) {
> > aggValue.start = newValue.start
> > aggValue.duration = 0
> > aggValue.open = true
> >   }
> >   else if (newValue.end) {
> > aggValue.duration = newValue.end - aggValue.start
> > aggValue.close = true
> >   } else {
> > aggValue.count++
> > aggValue.duration = now() - aggValue.start
> >   }
> > }
> >   )
> > 
> > Note you need to have a well defined Session class and Event class with
> > their appropriate serde.
> > So the aggregated stream would have Session with its attributes like start
> > time, duration, count, session open or close.
> > 
> > One thing you need to take care is after a session is closed a new session
> > for the same user can be created again.
> > So you may need to break sessions based on some windowing or as session is
> > closed you store it in some store (be it internal to kafka or some external
> > database) and reset the session object.
> > 
> > Hope this helps.
> > 
> > 
> > 
> > 
> > On Sat, Jan 4, 2020 at 10:02 PM Chris Madge  wrote:
> > 
> > > Hi there,
> > >
> > > It’s my first voyage into stream processing - I’ve tried a few things but
> > > I think I’m struggling to think in the streams way. I wondered if I could
> > > be cheeky and ask if someone could give me some clues as to the correct
> > > design for my first task to get me started?
> > >
> > > I have application events coming in like:
> > >
> > > ,type:start,
> > > ,type:action,
> > > ,type:action,
> > > ,type:action,
> > > ,type:end,
> > >
> > > each one represents a single user session.
> > >
> > > I need to output:
> > > , > > event>,,
> > >
> > > I’m working with event time (specified by the application) and I can’t
> > > trust the application to close sessions/notify gracefully (I’m happy for
> > > those to be thrown out, but cool ideas for alternatives are very 
> > > welcome!).
> > >
> > > Any advice would be much appreciated.
> > >
> > > Chris Madge
> > >
> > 
>


Re: [ANNOUNCE] New Kafka PMC Members: Colin, Vahid and Manikumar

2020-01-14 Thread John Roesler
Congrats, Colin, Vahid, and Manikumar! A great accomplishment, reflecting your 
great work.

-John

On Tue, Jan 14, 2020, at 11:33, Bill Bejeck wrote:
> Congrats Colin, Vahid and Manikumar! Well deserved.
> 
> -Bill
> 
> On Tue, Jan 14, 2020 at 12:30 PM Gwen Shapira  wrote:
> 
> > Hi everyone,
> >
> > I'm happy to announce that Colin McCabe, Vahid Hashemian and Manikumar
> > Reddy are now members of Apache Kafka PMC.
> >
> > Colin and Manikumar became committers on Sept 2018 and Vahid on Jan
> > 2019. They all contributed many patches, code reviews and participated
> > in many KIP discussions. We appreciate their contributions and are
> > looking forward to many more to come.
> >
> > Congrats Colin, Vahid and Manikumar!
> >
> > Gwen, on behalf of Apache Kafka PMC
> >
>


Re: KTable Suppress not working

2020-01-17 Thread John Roesler
Hi Sushrut,

That's frustrating... I haven't seen that before, but looking at the error
in combination with what you say happens without suppress makes
me think there's a large volume of data involved here. Probably,
the problem isn't specific to suppression, but it's just that the
interactions on the suppression buffers are pushing the system over
the edge.

Counterintuitively, adding Suppression can actually increase your
broker traffic because the Suppression buffer has to provide resiliency
guarantees, so it needs its own changelog, even though the aggregation
immediately before it _also_ has a changelog.

Judging from your description, you were just trying to batch more, rather
than specifically trying to get "final results" semantics for the window
results. In that case, you might want to try removing the suppression
and instead increasing the "CACHE_MAX_BYTES_BUFFERING_CONFIG"
and "COMMIT_INTERVAL_MS_CONFIG" configurations.

Hope this helps,
-John

On Fri, Jan 17, 2020, at 22:02, Sushrut Shivaswamy wrote:
> Hey,
> 
> I'm building a streams application where I'm trying to aggregate a stream
> of events
> and getting a list of events per key.
> `eventStream
> .groupByKey(Grouped.with(Serdes.String(), eventSerde))
> .windowedBy(TimeWindows.of(Duration.ofMillis(50)).grace(Duration.ofMillis(1)))
> .aggregate(
> ArrayList::new, (eent, accum) -> {
> accum.add(event);
> return accum;
> })
> .suppress(Suppressed.untilWindowCloses(BufferConfig.unbounded()))
> .toStream()
> .map((windowedKey, value) -> new KeyValue List>(windowedKey.key(), value))
> .map(eventProcessor::processEventsWindow)
> .to("event-window-chunks-queue", Produced.with(Serdes.String(),
> eventListSerde))`
> 
> As you can see I'm grouping events by key and capturing windowed lists of
> events for further processing.
> To be able to process the list of events per key in chunks I added
> `suppress()`.
> This does not seem to work though.
> I get this error multiple times:
> `Got error produce response with correlation id 5 on topic-partition
> app-test143-KTABLE-SUPPRESS-STATE-STORE-16-changelog-0, retrying
> (2147483646 attempts left). Error: NETWORK_EXCEPTION
> WARN org.apache.kafka.clients.producer.internals.Sender - Received invalid
> metadata error in produce request on partition
> shoonya-test143-KTABLE-SUPPRESS-STATE-STORE-16-changelog-0 due to
> org.apache.kafka.common.errors.NetworkException: The server disconnected
> before a response was received.. Going to request metadata update now`
> 
> When I comment out the suppress() line it works fine but I get a large
> number of events in a list while processing chunks since it does not
> suppress already evaluated chunks.
> Can anyone help me out with what could be happening here?
> 
> Regards,
> Sushrut
>


Re: KTable Suppress not working

2020-01-17 Thread John Roesler
Ah, I should add, if you actually want to use suppression, or
you need to resolve a similar error message in the future, you
probably need to tweak the batch sizes and/or timeout configs
of the various clients, and maybe the server as well.

That error message kind of sounds like the server went silent
long enough that the http session expired, or maybe it suffered
a long pause of some kind (GC, de-scheduling, etc.) that caused
the OS to hang up the socket.

I'm not super familiar with diagnosing these issues; I'm just 
trying to point you in the right direction in case you wanted
to directly solve the given error instead of trying something 
different.

Thanks,
-John

On Fri, Jan 17, 2020, at 23:33, John Roesler wrote:
> Hi Sushrut,
> 
> That's frustrating... I haven't seen that before, but looking at the error
> in combination with what you say happens without suppress makes
> me think there's a large volume of data involved here. Probably,
> the problem isn't specific to suppression, but it's just that the
> interactions on the suppression buffers are pushing the system over
> the edge.
> 
> Counterintuitively, adding Suppression can actually increase your
> broker traffic because the Suppression buffer has to provide resiliency
> guarantees, so it needs its own changelog, even though the aggregation
> immediately before it _also_ has a changelog.
> 
> Judging from your description, you were just trying to batch more, rather
> than specifically trying to get "final results" semantics for the window
> results. In that case, you might want to try removing the suppression
> and instead increasing the "CACHE_MAX_BYTES_BUFFERING_CONFIG"
> and "COMMIT_INTERVAL_MS_CONFIG" configurations.
> 
> Hope this helps,
> -John
> 
> On Fri, Jan 17, 2020, at 22:02, Sushrut Shivaswamy wrote:
> > Hey,
> > 
> > I'm building a streams application where I'm trying to aggregate a stream
> > of events
> > and getting a list of events per key.
> > `eventStream
> > .groupByKey(Grouped.with(Serdes.String(), eventSerde))
> > .windowedBy(TimeWindows.of(Duration.ofMillis(50)).grace(Duration.ofMillis(1)))
> > .aggregate(
> > ArrayList::new, (eent, accum) -> {
> > accum.add(event);
> > return accum;
> > })
> > .suppress(Suppressed.untilWindowCloses(BufferConfig.unbounded()))
> > .toStream()
> > .map((windowedKey, value) -> new KeyValue > List>(windowedKey.key(), value))
> > .map(eventProcessor::processEventsWindow)
> > .to("event-window-chunks-queue", Produced.with(Serdes.String(),
> > eventListSerde))`
> > 
> > As you can see I'm grouping events by key and capturing windowed lists of
> > events for further processing.
> > To be able to process the list of events per key in chunks I added
> > `suppress()`.
> > This does not seem to work though.
> > I get this error multiple times:
> > `Got error produce response with correlation id 5 on topic-partition
> > app-test143-KTABLE-SUPPRESS-STATE-STORE-16-changelog-0, retrying
> > (2147483646 attempts left). Error: NETWORK_EXCEPTION
> > WARN org.apache.kafka.clients.producer.internals.Sender - Received invalid
> > metadata error in produce request on partition
> > shoonya-test143-KTABLE-SUPPRESS-STATE-STORE-16-changelog-0 due to
> > org.apache.kafka.common.errors.NetworkException: The server disconnected
> > before a response was received.. Going to request metadata update now`
> > 
> > When I comment out the suppress() line it works fine but I get a large
> > number of events in a list while processing chunks since it does not
> > suppress already evaluated chunks.
> > Can anyone help me out with what could be happening here?
> > 
> > Regards,
> > Sushrut
> >
>


Re: KTable Suppress not working

2020-01-19 Thread John Roesler
Hi Sushrut,

I have to confess I don’t think I fully understand your last message, but I 
will try to help.

It sounds like maybe you’re thinking that streams would just repeatedly emit 
everything every commit? That is certainly not the case. If there are only 10 
events in window 1 and 10 in window 2, you would see at most 20 output events, 
regardless of any caching or suppression. That is, if you disable all caches, 
you get one output record ( an updated aggregation result) for each input 
record. Enabling caches only serves to reduce the number. 

I hope this helps,
John


On Sat, Jan 18, 2020, at 08:36, Sushrut Shivaswamy wrote:
> Hey John,
> 
> I tried following the docs here about the configs:
> `streamsConfiguration.put(StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG,
> 10 * 1024 * 1024L);
> // Set commit interval to 1 second.
> streamsConfiguration.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, 1000);`
> https://kafka.apache.org/10/documentation/streams/developer-guide/memory-mgmt
> 
> I'm trying to group events by id by accumulating them in a list and then
> spilt the aggregated list
> into smaller chunks for processing.
> I have a doubt about when windows expire and how aggregated values are
> flushed out.
> Lets assume in window 1(W1) 10 records arrived and in window 2(W2) 10 more
> records arrived for the same key.
> Assuming the cache can hold only 10 records in memory.
> Based on my understanding:
> At T1: 10 records from W1 are flushed
> At T2: 20 records from W1 + W2 are flushed.
> The records from W1 will be duplicated at the next commit time till that
> window expires.
> Is this accurate?
> If it is, can you share any way I can avoid/limit the number of times
> duplicate data is flushed?
> 
> Thanks,
> Sushrut
> 
> 
> 
> 
> 
> 
> On Sat, Jan 18, 2020 at 12:00 PM Sushrut Shivaswamy <
> sushrut.shivasw...@gmail.com> wrote:
> 
> > Thanks John,
> > I'll try increasing the "CACHE_MAX_BYTES_BUFFERING_CONFIG"
> > and "COMMIT_INTERVAL_MS_CONFIG" configurations.
> >
> > Thanks,
> > Sushrut
> >
> > On Sat, Jan 18, 2020 at 11:31 AM John Roesler  wrote:
> >
> >> Ah, I should add, if you actually want to use suppression, or
> >> you need to resolve a similar error message in the future, you
> >> probably need to tweak the batch sizes and/or timeout configs
> >> of the various clients, and maybe the server as well.
> >>
> >> That error message kind of sounds like the server went silent
> >> long enough that the http session expired, or maybe it suffered
> >> a long pause of some kind (GC, de-scheduling, etc.) that caused
> >> the OS to hang up the socket.
> >>
> >> I'm not super familiar with diagnosing these issues; I'm just
> >> trying to point you in the right direction in case you wanted
> >> to directly solve the given error instead of trying something
> >> different.
> >>
> >> Thanks,
> >> -John
> >>
> >> On Fri, Jan 17, 2020, at 23:33, John Roesler wrote:
> >> > Hi Sushrut,
> >> >
> >> > That's frustrating... I haven't seen that before, but looking at the
> >> error
> >> > in combination with what you say happens without suppress makes
> >> > me think there's a large volume of data involved here. Probably,
> >> > the problem isn't specific to suppression, but it's just that the
> >> > interactions on the suppression buffers are pushing the system over
> >> > the edge.
> >> >
> >> > Counterintuitively, adding Suppression can actually increase your
> >> > broker traffic because the Suppression buffer has to provide resiliency
> >> > guarantees, so it needs its own changelog, even though the aggregation
> >> > immediately before it _also_ has a changelog.
> >> >
> >> > Judging from your description, you were just trying to batch more,
> >> rather
> >> > than specifically trying to get "final results" semantics for the window
> >> > results. In that case, you might want to try removing the suppression
> >> > and instead increasing the "CACHE_MAX_BYTES_BUFFERING_CONFIG"
> >> > and "COMMIT_INTERVAL_MS_CONFIG" configurations.
> >> >
> >> > Hope this helps,
> >> > -John
> >> >
> >> > On Fri, Jan 17, 2020, at 22:02, Sushrut Shivaswamy wrote:
> >> > > Hey,
> >> > >
> >> > > I'm building a streams application where I'm tr

Re: Does Merging two kafka-streams preserve co-partitioning

2020-01-20 Thread John Roesler
Hi Yair,

You should be fine! 

Merging does preserve copartitioning.

Also processing on that partition is single-threaded, so you don’t have to 
worry about races on the same key in your transformer.

Actually, you might want to use transformValues to inform Streams that you 
haven’t changed the key. Otherwise, it would need to repartition the result 
before you could do further stateful processing. 

I hope this helps!

Thanks,
John

On Mon, Jan 20, 2020, at 05:27, Yair Halberstadt wrote:
> Hi
> I asked this question on stack-overflow and was wondering if anyone here
> could answer it:
> https://stackoverflow.com/questions/59820243/does-merging-two-kafka-streams-preserve-co-partitioning
> 
> 
> I have 2 co-partitioned kafka topics. One contains automatically generated
> data, and the other manual overrides.
> 
> I want to merge them and filter out any automatically generated data that
> has already been manually overidden, and then forward everything to a
> combined ChangeLog topic.
> 
> To do so I create a stream from each topic, and [merge the streams](
> https://kafka.apache.org/23/javadoc/org/apache/kafka/streams/kstream/KStream.html#merge-org.apache.kafka.streams.kstream.KStream-)
> using the dsl API.
> 
> I then apply the following transform, which stores any manual data, and
> deletes any automatic data which has already been manually overidden:
> (Scala but should be pretty easy to understand if you know java)
> 
> ```scala
> class FilterManuallyClassifiedTransformer(manualOverridesStoreName : String)
>   extends Transformer[Long, Data, KeyValue[Long, Data]] {
> 
>   // Array[Byte] used as dummy value since we don't use the value
>   var store: KeyValueStore[Long, Array[Byte]] = _
> 
>   override def init(context: ProcessorContext): Unit = {
> store = context.getStateStore(manualOverridesStoreName
> ).asInstanceOf[KeyValueStore[Long, Array[Byte]]]
>   }
> 
>   override def close(): Unit = {}
> 
>   override def transform(key: Long, value: Data): KeyValue[Long, Data] = {
> if (value.getIsManual) {
>   store.put(key, Array.emptyByteArray)
>   new KeyValue(key, value)
> }
> else if (store.get(key) == null) {
>   new KeyValue(key, value)
> }
> else {
>   null
> }
>   }
> }
> ```
> 
> If I understand correctly, there is no guarantee this will work unless
> manual data and automatic data with the same key are in the same partition.
> Otherwise the manual override might be stored in a different state store to
> the one that the automatic data checks.
> 
> And even if they are stored in the same StateStore there might be race
> conditions, where an automatic data checks the state store, then the manual
> override is added to the state store, then the manual override is written
> to the output topic, then the automatic data is written to the output
> topic, leading to the automatic data overwriting the manual override.
> 
> Is that correct?
> 
> And if so will `merge` preserve the co-partitioning guarantee I need?
> 
> Thanks for your help
>


Re: stop

2020-01-22 Thread John Roesler
Hey Sowjanya,

That won't work. The "welcome" email you got when you signed up for the mailing 
list has instructions for unsubscribing:

> To remove your address from the list, send a message to:
>  

Cheers,
-John

On Wed, Jan 22, 2020, at 10:12, Sowjanya Karangula wrote:
> stop
>


Re: Resource based kafka assignor

2020-01-31 Thread John Roesler
Hi Srinivas,

Your approach sounds fine, as long as you don’t need the view of the assignment 
to be strictly consistent. As a roughy approximation, it could work. 

On the other hand, if you’re writing a custom assignor, you could consider 
using the SubscriptionInfo field of the joinGroup request to encode arbitrary 
information from the members to the leader, which it can use in making 
decisions. So you could encode a custom “node id” there and not have to rely on 
patterns in the group id. Or you could even just directly encode node load 
information and use that to influence the assignment. 

Iirc, there’s no explicit “trigger rebalance” command, but you can still make 
it happen by doing something like unsubscribing and resubscribing again. 

I hope this helps!
John

On Thu, Jan 30, 2020, at 09:25, Devaki, Srinivas wrote:
> Also, want to clarify one more doubt,
> 
> is there any way for the client to explicitly trigger a rebalance without
> dying itself?
> 
> On Thu, Jan 30, 2020 at 7:54 PM Devaki, Srinivas 
> wrote:
> 
> > Hi All,
> >
> > We have a set of logstash consumer groups running under the same set of
> > instances, we have decided to run separate consumer groups subscribing
> > multiple topics instead of running single consumer group for all topics(the
> > reasoning behind this decision is because of how our elasticsearch cluster
> > is designed).
> >
> > Since we are running multiple consumer groups, sometimes we have detected
> > that a few ec2 nodes are receiving multiple high throughput topics in
> > different consumer groups. which was expected based on the implementation
> > of round robin assignor.
> >
> > So I've decided to make a partition assignor which will consider the
> > assignment based on other consumer group assignment.
> >
> > Could you please give me some pointers on how to proceed. This is my
> > initial ideas on the problem.
> >
> > Solution #0:
> > write an assignor, and use a specific consumer id pattern across all
> > consumer groups, and in the assignor do a describe on all consumer groups
> > and based on the topic throughput and the other consumer group assignment
> > decide the assignment of this topic
> >
> >
> >
> >
> > Thanks
> >
>


Re: "Skipping record for expired segment" in InMemoryWindowStore

2020-02-10 Thread John Roesler
Hi,

I’m sorry for the trouble. It looks like it was a mistake during

https://github.com/apache/kafka/pull/6521

Specifically, while addressing code review comments to change a bunch of other 
logs from debugs to warnings, that one seems to have been included by accident: 
https://github.com/apache/kafka/commit/ac27e8578f69d60a56ba28232d7e96c76957f66c

I’ll see if I can fix it today.

Regarding Bruno's thoughts, there was a pretty old decision to capture the 
"skipped records" as a metric for visibility and log it at the debug level for 
debuggability. We decided that "warning" wasn't the right level because Streams 
is operating completely as specified.

However, I do agree that it doesn't seem right to see more skipped records 
during start-up; I would expect to see exactly the same records skipped during 
start-up as during regular processing, since the skipping logic is completely 
deterministic and based on the sequence of timestamps your records have in the 
topic.  Maybe you just notice it more during startup? I.e., if there are 1000 
warning logs spread over a few months, then you don't notice it, but when you 
see them all together at start-up, it's more concerning?

Thanks,
-John


On Mon, Feb 10, 2020, at 10:15, Bruno Cadonna wrote:
> Hi,
> 
> I am pretty sure this was intentional. All skipped records log
> messages are on WARN level.
> 
> If a lot of your records are skipped on app restart with this log
> message on WARN-level, they were also skipped with the log message on
> DEBUG-level. You simply did not know about it before. With an
> in-memory window store, this message is logged when a window with a
> start time older than the current stream time minus the retention
> period is put into the window store, i.e., the window is NOT inserted
> into the window stroe. If you get a lot of them on app restart, you
> should have a look at the timestamps of your records and the retention
> of your window store. If those values do not explain the behavior,
> please try to find a minimal example that shows the issue and post it
> here on the mailing list.
> 
> On Mon, Feb 10, 2020 at 2:27 PM Samek, Jiří  wrote:
> >
> > Hi,
> >
> > in
> > https://github.com/apache/kafka/commit/9f5a69a4c2d6ac812ab6134e64839602a0840b87#diff-a5cfe68a5931441eff5f00261653dd10R134
> >
> > log level of "Skipping record for expired segment" was changed from debug
> > to warn. Was it intentional change? Should it be somehow handled by user?
> > How can user handle it? I am getting a lot of these on app restart.
>


Re: "Skipping record for expired segment" in InMemoryWindowStore

2020-02-10 Thread John Roesler
Hey all,

Sorry for the confusion. Bruno set me straight offline.

Previously, we had metrics for each reason for skipping records, and the
rationale was that you would monitor the metrics and only turn to the logs
if you needed to *debug* unexpected record skipping. Note that skipping
records by itself isn't a cause for concern, since this is exactly what Streams
is designed to do in a number of situations.

However, during the KIP-444 discussion, the decision was reversed, and we
decided to just log one "roll-up" metric for all skips and increase the log
messages to warning level for debuggability. This particularly makes sense
because you otherwise would have to restart the application to change the
log level if you needed to figure out why the single skipped-record metric
is non-zero. And then you may not even observe it again.

I either missed the memo on that discussion, or participated in it and then
forgot it even happened. I'm not sure I want to look back at the thread to
find out.

Anyway, I've closed the PR I opened to move it back to debug. We should
still try to help figure out the root cause of this particular email thread,
though.

Thanks,
-John

On Mon, Feb 10, 2020, at 12:20, Sophie Blee-Goldman wrote:
> While I agree that seems like it was probably a refactoring mistake, I'm
> not
> convinced it isn't the right thing to do. John, can you reiterate the
> argument
> for setting it to debug way back when?
> 
> I would actually present this exact situation as an argument for keeping it
> as
> warn, since something indeed seems fishy here that was only surfaced
> through this warning. That said, maybe the metric is the more appropriate
> way to bring attention to this: not sure if it's info or debug level
> though, or
> how likely it is that anyone really pays attention to it?
> 
> On Mon, Feb 10, 2020 at 9:53 AM John Roesler  wrote:
> 
> > Hi,
> >
> > I’m sorry for the trouble. It looks like it was a mistake during
> >
> > https://github.com/apache/kafka/pull/6521
> >
> > Specifically, while addressing code review comments to change a bunch of
> > other logs from debugs to warnings, that one seems to have been included by
> > accident:
> > https://github.com/apache/kafka/commit/ac27e8578f69d60a56ba28232d7e96c76957f66c
> >
> > I’ll see if I can fix it today.
> >
> > Regarding Bruno's thoughts, there was a pretty old decision to capture the
> > "skipped records" as a metric for visibility and log it at the debug level
> > for debuggability. We decided that "warning" wasn't the right level because
> > Streams is operating completely as specified.
> >
> > However, I do agree that it doesn't seem right to see more skipped records
> > during start-up; I would expect to see exactly the same records skipped
> > during start-up as during regular processing, since the skipping logic is
> > completely deterministic and based on the sequence of timestamps your
> > records have in the topic.  Maybe you just notice it more during startup?
> > I.e., if there are 1000 warning logs spread over a few months, then you
> > don't notice it, but when you see them all together at start-up, it's more
> > concerning?
> >
> > Thanks,
> > -John
> >
> >
> > On Mon, Feb 10, 2020, at 10:15, Bruno Cadonna wrote:
> > > Hi,
> > >
> > > I am pretty sure this was intentional. All skipped records log
> > > messages are on WARN level.
> > >
> > > If a lot of your records are skipped on app restart with this log
> > > message on WARN-level, they were also skipped with the log message on
> > > DEBUG-level. You simply did not know about it before. With an
> > > in-memory window store, this message is logged when a window with a
> > > start time older than the current stream time minus the retention
> > > period is put into the window store, i.e., the window is NOT inserted
> > > into the window stroe. If you get a lot of them on app restart, you
> > > should have a look at the timestamps of your records and the retention
> > > of your window store. If those values do not explain the behavior,
> > > please try to find a minimal example that shows the issue and post it
> > > here on the mailing list.
> > >
> > > On Mon, Feb 10, 2020 at 2:27 PM Samek, Jiří 
> > wrote:
> > > >
> > > > Hi,
> > > >
> > > > in
> > > >
> > https://github.com/apache/kafka/commit/9f5a69a4c2d6ac812ab6134e64839602a0840b87#diff-a5cfe68a5931441eff5f00261653dd10R134
> > > >
> > > > log level of "Skipping record for expired segment" was changed from
> > debug
> > > > to warn. Was it intentional change? Should it be somehow handled by
> > user?
> > > > How can user handle it? I am getting a lot of these on app restart.
> > >
> >
>


Re: "Skipping record for expired segment" in InMemoryWindowStore

2020-02-11 Thread John Roesler
tand how such a case can even
> theoretically happen. I expect that a window, in order to be written to the
> changelog topic, first needs to go through "put"; so even if it's mixed on
> the input side, it should be skipped if expired at the moment of "put"
> (relatively to observedStreamTime) and on restoration everything should be
> fine.
> 
> As the next step, I would like to list/inspect records and their timestamps
> from given partition of the changelog topic via a command line tool (or in
> some other way) - to confirm if they are really stored this way. If you
> have a tip on how to do it, please let me know.
> 
> That is all I have for now. I would like to resolve it. I will post it here
> if I come up with something new.
> 
> Thank you
> Jiri
> 
> 
> 
> On Mon, Feb 10, 2020 at 10:14 PM John Roesler  wrote:
> >
> > Hey all,
> >
> > Sorry for the confusion. Bruno set me straight offline.
> >
> > Previously, we had metrics for each reason for skipping records, and the
> > rationale was that you would monitor the metrics and only turn to the logs
> > if you needed to *debug* unexpected record skipping. Note that skipping
> > records by itself isn't a cause for concern, since this is exactly what
> Streams
> > is designed to do in a number of situations.
> >
> > However, during the KIP-444 discussion, the decision was reversed, and we
> > decided to just log one "roll-up" metric for all skips and increase the
> log
> > messages to warning level for debuggability. This particularly makes sense
> > because you otherwise would have to restart the application to change the
> > log level if you needed to figure out why the single skipped-record metric
> > is non-zero. And then you may not even observe it again.
> >
> > I either missed the memo on that discussion, or participated in it and
> then
> > forgot it even happened. I'm not sure I want to look back at the thread to
> > find out.
> >
> > Anyway, I've closed the PR I opened to move it back to debug. We should
> > still try to help figure out the root cause of this particular email
> thread,
> > though.
> >
> > Thanks,
> > -John
> >
> > On Mon, Feb 10, 2020, at 12:20, Sophie Blee-Goldman wrote:
> > > While I agree that seems like it was probably a refactoring mistake, I'm
> > > not
> > > convinced it isn't the right thing to do. John, can you reiterate the
> > > argument
> > > for setting it to debug way back when?
> > >
> > > I would actually present this exact situation as an argument for
> keeping it
> > > as
> > > warn, since something indeed seems fishy here that was only surfaced
> > > through this warning. That said, maybe the metric is the more
> appropriate
> > > way to bring attention to this: not sure if it's info or debug level
> > > though, or
> > > how likely it is that anyone really pays attention to it?
> > >
> > > On Mon, Feb 10, 2020 at 9:53 AM John Roesler  wrote:
> > >
> > > > Hi,
> > > >
> > > > I’m sorry for the trouble. It looks like it was a mistake during
> > > >
> > > > https://github.com/apache/kafka/pull/6521
> > > >
> > > > Specifically, while addressing code review comments to change a bunch
> of
> > > > other logs from debugs to warnings, that one seems to have been
> included by
> > > > accident:
> > > >
> https://github.com/apache/kafka/commit/ac27e8578f69d60a56ba28232d7e96c76957f66c
> > > >
> > > > I’ll see if I can fix it today.
> > > >
> > > > Regarding Bruno's thoughts, there was a pretty old decision to
> capture the
> > > > "skipped records" as a metric for visibility and log it at the debug
> level
> > > > for debuggability. We decided that "warning" wasn't the right level
> because
> > > > Streams is operating completely as specified.
> > > >
> > > > However, I do agree that it doesn't seem right to see more skipped
> records
> > > > during start-up; I would expect to see exactly the same records
> skipped
> > > > during start-up as during regular processing, since the skipping
> logic is
> > > > completely deterministic and based on the sequence of timestamps your
> > > > records have in the topic.  Maybe you just notice it more during
> startup?
> > > > I.e., if there are 1000 warning logs spread over 

Re: Using Kafka AdminUtils

2020-02-16 Thread John Roesler
Hi Victoria,

I’ve used the AdminClient for this kind of thing before. It’s the official java 
client for administrative actions like creating topics. You can create topics 
with any partition count, replication, or any other config. 

I hope this helps,
John

On Sat, Feb 15, 2020, at 22:41, Victoria Zuberman wrote:
> Hi,
> 
> I have an application based on Kafka Streams.
> It reads from Kafka topic (I call this topic “input topic”).
> That topic has many partitions and their number varies based on the env 
> in which application is running.
> I don’t want to create different input topics manually.
> Configuration of auto.create.topics.enable and num.partitions is not 
> enough for me.
> The solution I am looking to implement is to check during application 
> init whether the input topic exists and if not to create it with 
> relevant partition number and replication factor.
> 
> I found the following example that uses kafka.admin.AdminUtils and it 
> seems to be suitable:
> https://www.codota.com/code/java/methods/kafka.admin.AdminUtils/createTopic
> 
> Please advise whether using AdminUtils is considered a good practice.
> Is AdminUtils functionality considered stable and reliable?
> If there are other solutions, I would appreciate to hear about them.
> 
> Thanks,
> Victoria
> 
> ---
> NOTICE:
> This email and all attachments are confidential, may be proprietary, 
> and may be privileged or otherwise protected from disclosure. They are 
> intended solely for the individual or entity to whom the email is 
> addressed. However, mistakes sometimes happen in addressing emails. If 
> you believe that you are not an intended recipient, please stop reading 
> immediately. Do not copy, forward, or rely on the contents in any way. 
> Notify the sender and/or Imperva, Inc. by telephone at +1 (650) 
> 832-6006 and then delete or destroy any copy of this email and its 
> attachments. The sender reserves and asserts all rights to 
> confidentiality, as well as any privileges that may apply. Any 
> disclosure, copying, distribution or action taken or omitted to be 
> taken by an unintended recipient in reliance on this message is 
> prohibited and may be unlawful.
> Please consider the environment before printing this email.
>


Re: Using Kafka AdminUtils

2020-02-16 Thread John Roesler
Hi Victoria,

Sorry for the vagueness, I’m not in front of a computer right now, so I can 
only answer from memory. 

I’m not sure why that interface is still tagged “evolving”. Any changes to it 
would go through a deprecation period, just like any public interface in Kafka. 
We should probably remove that annotation. 

Your questions are good ones. It seems like we should update the JavaDoc to 
clarify. My observation is that you do get the exception if the topic already 
exists, and I doubt that this API would change any configuration for 
already-present topics. In my usage, I’ve just caught that exception and 
ignored it. 

Thanks,
John

On Sun, Feb 16, 2020, at 13:12, Victoria Zuberman wrote:
> Hi, John
> 
> Thanks a lot for valuable information.
> I looked at KafkaAdminClient and I see that it offers createTopics 
> method that indeed seems suitable.
> 
> I still have a couple of questions:
> 
> 1. In the documentation it is not mentioned what is the expected 
> behavior if the specified topic already exists.
>  Will it fail?
>  Will it throw TopicExistsException exception?
> If topic existed before createTopics was called will it remain 
> unchanged?
> The behavior is not easily deduced  from KafkaAdminClient code 
> alone, I did try
> 
> 2. I see that AdminClient is supported for a while now but API is still 
> marked as Evolving.
> From version notes it seems that its basic functionality (like 
> createTopics) remains pretty stable.
> Is it considered stable enough for production?
> 
> Thanks,
> Victoria
> 
> 
> On 16/02/2020, 20:15, "John Roesler"  wrote:
> 
> Hi Victoria,
> 
> I’ve used the AdminClient for this kind of thing before. It’s the 
> official java client for administrative actions like creating topics. 
> You can create topics with any partition count, replication, or any 
> other config.
> 
> I hope this helps,
> John
> 
> On Sat, Feb 15, 2020, at 22:41, Victoria Zuberman wrote:
> > Hi,
> >
> > I have an application based on Kafka Streams.
> > It reads from Kafka topic (I call this topic “input topic”).
> > That topic has many partitions and their number varies based on 
> the env
> > in which application is running.
> > I don’t want to create different input topics manually.
> > Configuration of auto.create.topics.enable and num.partitions is 
> not
> > enough for me.
> > The solution I am looking to implement is to check during 
> application
> > init whether the input topic exists and if not to create it with
> > relevant partition number and replication factor.
> >
> > I found the following example that uses kafka.admin.AdminUtils 
> and it
> > seems to be suitable:
> > 
> https://www.codota.com/code/java/methods/kafka.admin.AdminUtils/createTopic
> >
> > Please advise whether using AdminUtils is considered a good 
> practice.
> > Is AdminUtils functionality considered stable and reliable?
> > If there are other solutions, I would appreciate to hear about 
> them.
> >
> > Thanks,
> > Victoria
> >
> > ---
> > NOTICE:
> > This email and all attachments are confidential, may be 
> proprietary,
> > and may be privileged or otherwise protected from disclosure. 
> They are
> > intended solely for the individual or entity to whom the email is
> > addressed. However, mistakes sometimes happen in addressing 
> emails. If
> > you believe that you are not an intended recipient, please stop 
> reading
> > immediately. Do not copy, forward, or rely on the contents in any 
> way.
> > Notify the sender and/or Imperva, Inc. by telephone at +1 (650)
> > 832-6006 and then delete or destroy any copy of this email and its
> > attachments. The sender reserves and asserts all rights to
> > confidentiality, as well as any privileges that may apply. Any
> > disclosure, copying, distribution or action taken or omitted to be
> > taken by an unintended recipient in reliance on this message is
> > prohibited and may be unlawful.
> > Please consider the environment before printing this email.
> >
> 
> 
> ---
> NOTICE:
> This email and all attachments are confidential, may be proprietary, 
> and may be privileged or otherwise protected from disclosure. They are 
> intended solely for the individual or entity to whom the email is 
> addressed. However, mistakes sometimes happen in

Re: KTable in Compact Topic takes too long to be updated

2020-02-19 Thread John Roesler
Hi Renato,

Can you describe a little more about the nature of the join+aggregation
logic? It sounds a little like the KTable represents the result of aggregating
messages from the KStream?

If that's the case, the operation you probably wanted was like:

> KStream.groupBy().aggregate()

which produces a KTable view of the aggregation result, and also guarantees
that when processing the second message, you'll see the result of having 
processed the first.

Let me know if I've misunderstood.

Thanks,
-John

On Wed, Feb 19, 2020, at 14:03, Renato Melo wrote:
> Hi Kafka Community,
> 
> Please take a look into my use case:
> 
> Fist message1
> 1. We have a KStream joined to a KTable(Compact Topic).
> 2. We received a message1 from the KStream, aggregates the message1 to 
> the joined messageA from KTable. 
> 3. We pushes back the messageA with aggregated message1 into KTable.
> 
> Second message 2
> 4. Message2 arrives on KStream and joins to the expected update 
> MessageA from the KTable. For our surprise messageA was not yet updated.
> 5. We aggregate message2 into messageA.
> 6. We pushes messageA to the KTable(Compact topic) and the first 
> aggregated message is overwritten.
> 
> Is there a way to speed up the update in the Ktable (Compact Topic)?
> 
> Is something wrong with my use case?
> 
> I do appreciate any help. Thank you in advanced.
> 
> Renato de Melo
> 
> 
>


Re: KTable in Compact Topic takes too long to be updated

2020-02-20 Thread John Roesler
Aha! Thanks, Renato, that's very clear.

I think there's a couple of ways you can model this, but one thing that
applies to all of them is that you should consider the `max.task.idle.ms`
configuration option. If you set it higher than `max.poll.interval.ms`,
then Streams will be able to ensure that it processes records from
multiple inputs in timestamp order (as long as the upstream systems
have already buffered the data). It's not strictly necessary for the algorithms
I'd propose, but It's probably more intuitive e.g., if you process the "order"
record before the "item" records for that order.

Offhand, it seems like you could take two high-level approaches here.

One is to collect the items into an "order items" table, keyed by order
id, and _then_ join it with the order table. This is probably more attractive
if you also have other uses for the order table that don't involve the items.
It looks like this:

KStream items = ...
KTable orders = ...

KTable> orderItems = 
  items
.groupBy( item -> new KeyValue(item.orderID, item))
.aggregate(
  () -> new HashMap(),
  (orderId, item, items) -> { items.put(item.itemID, item); return items; }
)

KTable result = 
  orders.join(
orderItems,
(order, itemsForOrder) -> new OrderWithItems(order, itemsForOrder)
  );

Note that this is just a sketch. In reality, you'll have to specify a serde for
the HashMap, etc.


The other idea I had was to treat it as a pure aggregation, by first merging
orders and items into a single stream and just doing the stream aggregation
on it. Something like:

KStream items = ...
KStream orders = ...

KStream itemsByOrderId =
  items.selectKey( (itemId, item) -> item.orderID );

KStream merged = orders.merge(itemsByOrderId);

KTable result =
  merged
.groupByKey()
.aggregate(
  () -> new OrderWithItems(),
  (orderId, object, orderWithItems) -> {
if (object instanceOf Order) {
  orderWithItems.setOrder((Order) object);
} else if (object instanceOf Item) {
  orderWithItems.addItem((Item) object);
} else {
  throw new IllegalArgumentException("Unexpected value: " + object);
}
  }
);

I think the most important consideration between these is just which
one you and your team find more intuitive. They should have about the
same performance characteristics except:
* The first option needs both input KTables to be stored (to compute the join)
* The second option stores just one KTable, consisting of the _result_
The sizes of these two data sets should actually be about the same
*except* that if you _also_ choose to materialize the _result_ of the join
then the first approach would use twice the storage.

But I'd really strongly recommend to favor clear code over efficient code,
unless you know ahead of time that storage efficiency is going to be a
big problem.

I hope this helps!
-John


On Wed, Feb 19, 2020, at 16:19, Renato Melo wrote:
>  Hi John,
> Thank you for your reply.
> 
> Let me clarify.
> 
> I used the word aggregate, but we are using aggregate functions. Our 
> case is a relationship whole-part between messageA and message1, 2, n. 
> Like order and order items.
> 
> So translating our case, messageA is the order and message1 and 2 are items.
> 
> When I said we aggregate, I was trying to say we add item to the order. 
> 
> So we have an order in the KTable.
> 
> When the first item arrives, Kafka Streams joins the item to order.
> 
> Then we add the item to the order. Do some calculations. And them we 
> have a separated Kafka producer that pushes the order back to the 
> KTable.
> After the first item we expected this:
> Order (item1)
> Then the second item arrives and the Kafka Streams joins the item2 to 
> Order in the streams, but the order is not updated yet. So we add the 
> item2 to order and instead of having:
> Order(item1, item2) 
> 
> we have 
> 
> Order(item2)
> I hope I made more clear our scenario.
> Regards,
> 
> Renato de Melo
> 
>Em quarta-feira, 19 de fevereiro de 2020 18:12:15 BRT, John Roesler 
>  escreveu:  
>  
>  Hi Renato,
> 
> Can you describe a little more about the nature of the join+aggregation
> logic? It sounds a little like the KTable represents the result of aggregating
> messages from the KStream?
> 
> If that's the case, the operation you probably wanted was like:
> 
> > KStream.groupBy().aggregate()
> 
> which produces a KTable view of the aggregation result, and also guarantees
> that when processing the second message, you'll see the result of having 
> processed the first.
> 
> Let me know if I've misunderstood.
> 
> Thanks,
> -John
> 
> On Wed, Feb 19, 2020, at 14:03, Renato Melo wrote:
> 

Re: [ANNOUNCE] New committer: Konstantine Karantasis

2020-02-26 Thread John Roesler
Congrats, Konstantine! Awesome news.
-John

On Wed, Feb 26, 2020, at 16:39, Bill Bejeck wrote:
> Congratulations Konstantine! Well deserved.
> 
> -Bill
> 
> On Wed, Feb 26, 2020 at 5:37 PM Jason Gustafson  wrote:
> 
> > The PMC for Apache Kafka has invited Konstantine Karantasis as a committer
> > and we
> > are pleased to announce that he has accepted!
> >
> > Konstantine has contributed 56 patches and helped to review even more. His
> > recent work includes a major overhaul of the Connect task management system
> > in order to support incremental rebalancing. In addition to code
> > contributions, Konstantine helps the community in many other ways including
> > talks at meetups and at Kafka Summit and answering questions on
> > stackoverflow. He consistently shows good judgement in design and a careful
> > attention to details when it comes to code.
> >
> > Thanks for all the contributions and looking forward to more!
> >
> > Jason, on behalf of the Apache Kafka PMC
> >
>


  1   2   >