Re: [DISCUSS] KIP-138: Change punctuate semantics

Matthias J. Sax Mon, 24 Apr 2017 10:23:21 -0700

>> Would a dashboard need perfect regularity? Wouldn't an upper bound suffice?


If you go with stream-time and don't have any input records for all
partitions, punctuate would not be called at all, and thus your
dashboard would "freeze".

>> I thought about cron-type things, but aren't they better triggered by an
>> external scheduler (they're more flexible anyway), which then feeds
>> "commands" into the topology?

I guess it depends what kind of periodic action you want to trigger. The
"cron job" was just an analogy. Maybe it's just a heartbeat to some
other service, that signals that your Streams app is still running.


-Matthias


On 4/24/17 10:02 AM, Michal Borowiecki wrote:
> Thanks!
> 
> Would a dashboard need perfect regularity? Wouldn't an upper bound suffice?
> 
> Unless too frequent messages on replay could overpower it?
> 
> 
> I thought about cron-type things, but aren't they better triggered by an
> external scheduler (they're more flexible anyway), which then feeds
> "commands" into the topology?
> 
> Just my 2c.
> 
> Cheers,
> 
> Michal
> 
> 
> On 24/04/17 17:57, Matthias J. Sax wrote:
>> A simple example would be some dashboard app, that needs to get
>> "current" status in regular time intervals (ie, and real-time app).
>>
>> Or something like a "scheduler" -- think "cron job" application.
>>
>>
>> -Matthias
>>
>> On 4/24/17 2:23 AM, Michal Borowiecki wrote:
>>> Hi Matthias,
>>>
>>> I agree it's difficult to reason about the hybrid approach, I certainly
>>> found it hard and I'm totally on board with the mantra.
>>>
>>> I'd be happy to limit the scope of this KIP to add system-time
>>> punctuation semantics (in addition to existing stream-time semantics)
>>> and leave more complex schemes for users to implement on top of that.
>>>
>>> Further additional PunctuationTypes, could then be added by future KIPs,
>>> possibly including the hybrid approach once it has been given more thought.
>>>
>>>> There are real-time applications, that want to get
>>>> callbacks in regular system-time intervals (completely independent from
>>>> stream-time).
>>> Can you please describe what they are, so that I can put them on the
>>> wiki for later reference?
>>>
>>> Thanks,
>>>
>>> Michal
>>>
>>>
>>> On 23/04/17 21:27, Matthias J. Sax wrote:
>>>> Hi,
>>>>
>>>> I do like Damian's API proposal about the punctuation callback function.
>>>>
>>>> I also did reread the KIP and thought about the semantics we want to
>>>> provide.
>>>>
>>>>> Given the above, I don't see a reason any more for a separate system-time 
>>>>> based punctuation.
>>>> I disagree here. There are real-time applications, that want to get
>>>> callbacks in regular system-time intervals (completely independent from
>>>> stream-time). Thus we should allow this -- if we really follow the
>>>> "hybrid" approach, this could be configured with stream-time interval
>>>> infinite and delay whatever system-time punctuation interval you want to
>>>> have. However, I would like to add a proper API for this and do this
>>>> configuration under the hood (that would allow one implementation within
>>>> all kind of branching for different cases).
>>>>
>>>> Thus, we definitely should have PunctutionType#StreamTime and
>>>> #SystemTime -- and additionally, we _could_ have #Hybrid. Thus, I am not
>>>> a fan of your latest API proposal.
>>>>
>>>>
>>>> About the hybrid approach in general. On the one hand I like it, on the
>>>> other hand, it seems to be rather (1) complicated (not necessarily from
>>>> an implementation point of view, but for people to understand it) and
>>>> (2) mixes two semantics together in a "weird" way". Thus, I disagree with:
>>>>
>>>>> It may appear complicated at first but I do think these semantics will
>>>>> still be more understandable to users than having 2 separate punctuation
>>>>> schedules/callbacks with different PunctuationTypes.
>>>> This statement only holds if you apply strong assumptions that I don't
>>>> believe hold in general -- see (2) for details -- and I think it is
>>>> harder than you assume to reason about the hybrid approach in general.
>>>> IMHO, the hybrid approach is a "false friend" that seems to be easy to
>>>> reason about...
>>>>
>>>>
>>>> (1) Streams always embraced "easy to use" and we should really be
>>>> careful to keep it this way. On the other hand, as we are talking about
>>>> changes to PAPI, it won't affect DSL users (DSL does not use punctuation
>>>> at all at the moment), and thus, the "easy to use" mantra might not be
>>>> affected, while it will allow advanced users to express more complex stuff.
>>>>
>>>> I like the mantra: "make simple thing easy and complex things possible".
>>>>
>>>> (2) IMHO the major disadvantage (issue?) of the hybrid approach is the
>>>> implicit assumption that even-time progresses at the same "speed" as
>>>> system-time during regular processing. This implies the assumption that
>>>> a slower progress in stream-time indicates the absence of input events
>>>> (and that later arriving input events will have a larger event-time with
>>>> high probability). Even if this might be true for some use cases, I
>>>> doubt it holds in general. Assume that you get a spike in traffic and
>>>> for some reason stream-time does advance slowly because you have more
>>>> records to process. This might trigger a system-time based punctuation
>>>> call even if this seems not to be intended. I strongly believe that it
>>>> is not easy to reason about the semantics of the hybrid approach (even
>>>> if the intentional semantics would be super useful -- but I doubt that
>>>> we get want we ask for).
>>>>
>>>> Thus, I also believe that one might need different "configuration"
>>>> values for the hybrid approach if you run the same code for different
>>>> scenarios: regular processing, re-processing, catching up scenario. And
>>>> as the term "configuration" implies, we might be better off to not mix
>>>> configuration with business logic that is expressed via code.
>>>>
>>>>
>>>> One more comment: I also don't think that the hybrid approach is
>>>> deterministic as claimed in the use-case subpage. I understand the
>>>> reasoning and agree, that it is deterministic if certain assumptions
>>>> hold -- compare above -- and if configured correctly. But strictly
>>>> speaking it's not because there is a dependency on system-time (and
>>>> IMHO, if system-time is involved it cannot be deterministic by definition).
>>>>
>>>>
>>>>> I see how in theory this could be implemented on top of the 2 punctuate
>>>>> callbacks with the 2 different PunctuationTypes (one stream-time based,
>>>>> the other system-time based) but it would be a much more complicated
>>>>> scheme and I don't want to suggest that.
>>>> I agree that expressing the intended hybrid semantics is harder if we
>>>> offer only #StreamTime and #SystemTime punctuation. However, I also
>>>> believe that the hybrid approach is a "false friend" with regard to
>>>> reasoning about the semantics (it indicates that it more easy as it is
>>>> in reality). Therefore, we might be better off to not offer the hybrid
>>>> approach and make it clear to a developed, that it is hard to mix
>>>> #StreamTime and #SystemTime in a semantically sound way.
>>>>
>>>>
>>>> Looking forward to your feedback. :)
>>>>
>>>> -Matthias
>>>>
>>>>
>>>>
>>>>
>>>> On 4/22/17 11:43 AM, Michal Borowiecki wrote:
>>>>> Hi all,
>>>>>
>>>>> Looking for feedback on the functional interface approach Damian
>>>>> proposed. What do people think?
>>>>>
>>>>> Further on the semantics of triggering punctuate though:
>>>>>
>>>>> I ran through the 2 use cases that Arun had kindly put on the wiki
>>>>> (https://cwiki.apache.org/confluence/display/KAFKA/Punctuate+Use+Cases)
>>>>> in my head and on a whiteboard and I can't find a better solution than
>>>>> the "hybrid" approach he had proposed.
>>>>>
>>>>> I see how in theory this could be implemented on top of the 2 punctuate
>>>>> callbacks with the 2 different PunctuationTypes (one stream-time based,
>>>>> the other system-time based) but it would be a much more complicated
>>>>> scheme and I don't want to suggest that.
>>>>>
>>>>> However, to add to the hybrid algorithm proposed, I suggest one
>>>>> parameter to that: a tolerance period, expressed in milliseconds
>>>>> system-time, after which the punctuation will be invoked in case the
>>>>> stream-time advance hasn't triggered it within the requested interval
>>>>> since the last invocation of punctuate (as recorded in system-time)
>>>>>
>>>>> This would allow a user-defined tolerance for late arriving events. The
>>>>> trade off would be left for the user to decide: regular punctuation in
>>>>> the case of absence of events vs allowing for records arriving late or
>>>>> some build-up due to processing not catching up with the event rate.
>>>>> In the one extreme, this tolerance could be set to infinity, turning
>>>>> hybrid into simply stream-time based punctuate, like we have now. In the
>>>>> other extreme, the tolerance could be set to 0, resulting in a
>>>>> system-time upper bound on the effective punctuation interval.
>>>>>
>>>>> Given the above, I don't see a reason any more for a separate
>>>>> system-time based punctuation. The "hybrid" approach with 0ms tolerance
>>>>> would under normal operation trigger at regular intervals wrt the
>>>>> system-time, except in cases of re-play/catch-up, where the stream time
>>>>> advances faster than system time. In these cases punctuate would happen
>>>>> more often than the specified interval wrt system time. However, the
>>>>> use-cases that need system-time punctuations (that I've seen at least)
>>>>> really only have a need for an upper bound on punctuation delay but
>>>>> don't need a lower bound.
>>>>>
>>>>> To that effect I'd propose the api to be as follows, on ProcessorContext:
>>>>>
>>>>> schedule(Punctuator callback, long interval, long toleranceIterval); // 
>>>>> schedules punctuate at stream-time intervals with a system-time upper 
>>>>> bound of (interval+toleranceInterval)
>>>>>
>>>>> schedule(Punctuator callback, long interval); // schedules punctuate at 
>>>>> stream-time intervals without an system-time upper bound - this is 
>>>>> equivalent to current stream-time based punctuate
>>>>>
>>>>> Punctuation is triggered when either:
>>>>> - the stream time advances past the (stream time of the previous
>>>>> punctuation) + interval;
>>>>> - or (iff the toleranceInterval is set) when the system time advances
>>>>> past the (system time of the previous punctuation) + interval +
>>>>> toleranceInterval
>>>>>
>>>>> In either case:
>>>>> - we trigger punctuate passing as the argument the stream time at which
>>>>> the current punctuation was meant to happen
>>>>> - next punctuate is scheduled at (stream time at which the current
>>>>> punctuation was meant to happen) + interval
>>>>>
>>>>> It may appear complicated at first but I do think these semantics will
>>>>> still be more understandable to users than having 2 separate punctuation
>>>>> schedules/callbacks with different PunctuationTypes.
>>>>>
>>>>>
>>>>>
>>>>> PS. Having re-read this, maybe the following alternative would be easier
>>>>> to understand (WDYT?):
>>>>>
>>>>> schedule(Punctuator callback, long streamTimeInterval, long 
>>>>> systemTimeUpperBound); // schedules punctuate at stream-time intervals 
>>>>> with a system-time upper bound - systemTimeUpperBound must be no less 
>>>>> than streamTimeInterval
>>>>>
>>>>> schedule(Punctuator callback, long streamTimeInterval); // schedules 
>>>>> punctuate at stream-time intervals without a system-time upper bound - 
>>>>> this is equivalent to current stream-time based punctuate
>>>>>
>>>>> Punctuation is triggered when either:
>>>>> - the stream time advances past the (stream time of the previous
>>>>> punctuation) + streamTimeInterval;
>>>>> - or (iff systemTimeUpperBound is set) when the system time advances
>>>>> past the (system time of the previous punctuation) + systemTimeUpperBound
>>>>>
>>>>> Awaiting comments.
>>>>>
>>>>> Thanks,
>>>>> Michal
>>>>>
>>>>> On 21/04/17 16:56, Michal Borowiecki wrote:
>>>>>> Yes, that's what I meant. Just wanted to highlight we'd deprecate it
>>>>>> in favour of something that doesn't return a record. Not a problem 
>>>>>> though.
>>>>>>
>>>>>>
>>>>>> On 21/04/17 16:32, Damian Guy wrote:
>>>>>>> Thanks Michal,
>>>>>>> I agree Transformer.punctuate should also be void, but we can deprecate
>>>>>>> that too in favor of the new interface.
>>>>>>>
>>>>>>> Thanks for the javadoc PR!
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Damian
>>>>>>>
>>>>>>> On Fri, 21 Apr 2017 at 09:31 Michal Borowiecki <
>>>>>>> michal.borowie...@openbet.com> wrote:
>>>>>>>
>>>>>>>> Yes, that looks better to me.
>>>>>>>>
>>>>>>>> Note that punctuate on Transformer is currently returning a record, 
>>>>>>>> but I
>>>>>>>> think it's ok to have all output records be sent via
>>>>>>>> ProcessorContext.forward, which has to be used anyway if you want to 
>>>>>>>> send
>>>>>>>> multiple records from one invocation of punctuate.
>>>>>>>>
>>>>>>>> This way it's consistent between Processor and Transformer.
>>>>>>>>
>>>>>>>>
>>>>>>>> BTW, looking at this I found a glitch in the javadoc and put a comment
>>>>>>>> there:
>>>>>>>>
>>>>>>>> https://github.com/apache/kafka/pull/2413/files#r112634612
>>>>>>>>
>>>>>>>> and PR: https://github.com/apache/kafka/pull/2884
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>>
>>>>>>>> Michal
>>>>>>>> On 20/04/17 18:55, Damian Guy wrote:
>>>>>>>>
>>>>>>>> Hi Michal,
>>>>>>>>
>>>>>>>> Thanks for the KIP. I'd like to propose a bit more of a radical change 
>>>>>>>> to
>>>>>>>> the API.
>>>>>>>> 1. deprecate the punctuate method on Processor
>>>>>>>> 2. create a new Functional Interface just for Punctuation, something 
>>>>>>>> like:
>>>>>>>> interface Punctuator {
>>>>>>>>     void punctuate(long timestamp)
>>>>>>>> }
>>>>>>>> 3. add a new schedule function to ProcessorContext: schedule(long
>>>>>>>> interval, PunctuationType type, Punctuator callback)
>>>>>>>> 4. deprecate the existing schedule function
>>>>>>>>
>>>>>>>> Thoughts?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Damian
>>>>>>>>
>>>>>>>> On Sun, 16 Apr 2017 at 21:55 Michal Borowiecki <
>>>>>>>> michal.borowie...@openbet.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Thomas,
>>>>>>>>>
>>>>>>>>> I would say our use cases fall in the same category as yours.
>>>>>>>>>
>>>>>>>>> 1) One is expiry of old records, it's virtually identical to yours.
>>>>>>>>>
>>>>>>>>> 2) Second one is somewhat more convoluted but boils down to the same 
>>>>>>>>> type
>>>>>>>>> of design:
>>>>>>>>>
>>>>>>>>> Incoming messages carry a number of fields, including a timestamp.
>>>>>>>>>
>>>>>>>>> Outgoing messages contain derived fields, one of them (X) is depended 
>>>>>>>>> on
>>>>>>>>> by the timestamp input field (Y) and some other input field (Z).
>>>>>>>>>
>>>>>>>>> Since the output field X is derived in some non-trivial way, we don't
>>>>>>>>> want to force the logic onto downstream apps. Instead we want to 
>>>>>>>>> calculate
>>>>>>>>> it in the Kafka Streams app, which means we re-calculate X as soon as 
>>>>>>>>> the
>>>>>>>>> timestamp in Y is reached (wall clock time) and send a message if it
>>>>>>>>> changed (I say "if" because the derived field (X) is also conditional 
>>>>>>>>> on
>>>>>>>>> another input field Z).
>>>>>>>>>
>>>>>>>>> So we have kv stores with the records and an additional kv store with
>>>>>>>>> timestamp->id mapping which act like an index where we periodically 
>>>>>>>>> do a
>>>>>>>>> ranged query.
>>>>>>>>>
>>>>>>>>> Initially we naively tried doing it in punctuate which of course 
>>>>>>>>> didn't
>>>>>>>>> work when there were no regular msgs on the input topic.
>>>>>>>>> Since this was before 0.10.1 and state stores weren't query-able from
>>>>>>>>> outside we created a "ticker" that produced msgs once per second onto
>>>>>>>>> another topic and fed it into the same topology to trigger punctuate.
>>>>>>>>> This didn't work either, which was much more surprising to us at the
>>>>>>>>> time, because it was not obvious at all that punctuate is only 
>>>>>>>>> triggered if
>>>>>>>>> *all* input partitions receive messages regularly.
>>>>>>>>> In the end we had to break this into 2 separate Kafka Streams. Main
>>>>>>>>> transformer doesn't use punctuate but sends values of timestamp field 
>>>>>>>>> Y and
>>>>>>>>> the id to a "scheduler" topic where also the periodic ticks are sent. 
>>>>>>>>> This
>>>>>>>>> is consumed by the second topology and is its only input topic. 
>>>>>>>>> There's a
>>>>>>>>> transformer on that topic which populates and updates the time-based
>>>>>>>>> indexes and polls them from punctuate. If the time in the timestamp
>>>>>>>>> elapsed, the record id is sent to the main transformer, which
>>>>>>>>> updates/deletes the record from the main kv store and forwards the
>>>>>>>>> transformed record to the output topic.
>>>>>>>>>
>>>>>>>>> To me this setup feels horrendously complicated for what it does.
>>>>>>>>>
>>>>>>>>> We could incrementally improve on this since 0.10.1 to poll the
>>>>>>>>> timestamp->id "index" stores from some code outside the KafkaStreams
>>>>>>>>> topology so that at least we wouldn't need the extra topic for 
>>>>>>>>> "ticks".
>>>>>>>>> However, the ticks don't feel so hacky when you realise they give you
>>>>>>>>> some hypothetical benefits in predictability. You can reprocess the
>>>>>>>>> messages in a reproducible manner, since the topologies use 
>>>>>>>>> event-time,
>>>>>>>>> just that the event time is simply the wall-clock time fed into a 
>>>>>>>>> topic by
>>>>>>>>> the ticks. (NB in our use case we haven't yet found a need for this 
>>>>>>>>> kind of
>>>>>>>>> reprocessing).
>>>>>>>>> To make that work though, we would have to have the stream time 
>>>>>>>>> advance
>>>>>>>>> based on the presence of msgs on the "tick" topic, regardless of the
>>>>>>>>> presence of messages on the other input topic.
>>>>>>>>>
>>>>>>>>> Same as in the expiry use case, both the wall-clock triggered 
>>>>>>>>> punctuate
>>>>>>>>> and the hybrid would work to simplify this a lot.
>>>>>>>>>
>>>>>>>>> 3) Finally, I have a 3rd use case in the making but I'm still looking 
>>>>>>>>> if
>>>>>>>>> we can achieve it using session windows instead. I'll keep you posted 
>>>>>>>>> if we
>>>>>>>>> have to go with punctuate there too.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Michal
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 11/04/17 20:52, Thomas Becker wrote:
>>>>>>>>>
>>>>>>>>> Here's an example that we currently have.  We have a streams processor
>>>>>>>>> that does a transform from one topic into another. One of the fields 
>>>>>>>>> in
>>>>>>>>> the source topic record is an expiration time, and one of the 
>>>>>>>>> functions
>>>>>>>>> of the processor is to ensure that expired records get deleted 
>>>>>>>>> promptly
>>>>>>>>> after that time passes (typically days or weeks after the message was
>>>>>>>>> originally produced). To do that, the processor keeps a state store of
>>>>>>>>> keys and expiration times, iterates that store in punctuate(), and
>>>>>>>>> emits delete (null) records for expired items. This needs to happen at
>>>>>>>>> some minimum interval regardless of the incoming message rate of the
>>>>>>>>> source topic.
>>>>>>>>>
>>>>>>>>> In this scenario, the expiration of records is the primary function of
>>>>>>>>> punctuate, and therefore the key requirement is that the wall-clock
>>>>>>>>> measured time between punctuate calls have some upper-bound. So a pure
>>>>>>>>> wall-clock based schedule would be fine for our needs. But the 
>>>>>>>>> proposed
>>>>>>>>> "hybrid" system would also be acceptable if that satisfies a broader
>>>>>>>>> range of use-cases.
>>>>>>>>>
>>>>>>>>> On Tue, 2017-04-11 at 14:41 +0200, Michael Noll wrote:
>>>>>>>>>
>>>>>>>>> I apologize for the longer email below.  To my defense, it started
>>>>>>>>> out much
>>>>>>>>> shorter. :-)  Also, to be super-clear, I am intentionally playing
>>>>>>>>> devil's
>>>>>>>>> advocate for a number of arguments brought forth in order to help
>>>>>>>>> improve
>>>>>>>>> this KIP -- I am not implying I necessarily disagree with the
>>>>>>>>> arguments.
>>>>>>>>>
>>>>>>>>> That aside, here are some further thoughts.
>>>>>>>>>
>>>>>>>>> First, there are (at least?) two categories for actions/behavior you
>>>>>>>>> invoke
>>>>>>>>> via punctuate():
>>>>>>>>>
>>>>>>>>> 1. For internal housekeeping of your Processor or Transformer (e.g.,
>>>>>>>>> to
>>>>>>>>> periodically commit to a custom store, to do metrics/logging).  Here,
>>>>>>>>> the
>>>>>>>>> impact of punctuate is typically not observable by other processing
>>>>>>>>> nodes
>>>>>>>>> in the topology.
>>>>>>>>> 2. For controlling the emit frequency of downstream records.  Here,
>>>>>>>>> the
>>>>>>>>> punctuate is all about being observable by downstream processing
>>>>>>>>> nodes.
>>>>>>>>>
>>>>>>>>> A few releases back, we introduced record caches (DSL) and state
>>>>>>>>> store
>>>>>>>>> caches (Processor API) in KIP-63.  Here, we addressed a concern
>>>>>>>>> relating to
>>>>>>>>> (2) where some users needed to control -- here: limit -- the
>>>>>>>>> downstream
>>>>>>>>> output rate of Kafka Streams because the downstream systems/apps
>>>>>>>>> would not
>>>>>>>>> be able to keep up with the upstream output rate (Kafka scalability >
>>>>>>>>> their
>>>>>>>>> scalability).  The argument for KIP-63, which notably did not
>>>>>>>>> introduce a
>>>>>>>>> "trigger" API, was that such an interaction with downstream systems
>>>>>>>>> is an
>>>>>>>>> operational concern;  it should not impact the processing *logic* of
>>>>>>>>> your
>>>>>>>>> application, and thus we didn't want to complicate the Kafka Streams
>>>>>>>>> API,
>>>>>>>>> especially not the declarative DSL, with such operational concerns.
>>>>>>>>>
>>>>>>>>> This KIP's discussion on `punctuate()` takes us back in time (<--
>>>>>>>>> sorry, I
>>>>>>>>> couldn't resist to not make this pun :-P).  As a meta-comment, I am
>>>>>>>>> observing that our conversation is moving more and more into the
>>>>>>>>> direction
>>>>>>>>> of explicit "triggers" because, so far, I have seen only motivations
>>>>>>>>> for
>>>>>>>>> use cases in category (2), but none yet for (1)?  For example, some
>>>>>>>>> comments voiced here are about sth like "IF stream-time didn't
>>>>>>>>> trigger
>>>>>>>>> punctuate, THEN trigger punctuate based on processing-time".  Do we
>>>>>>>>> want
>>>>>>>>> this, and if so, for which use cases and benefits?  Also, on a
>>>>>>>>> related
>>>>>>>>> note, whatever we are discussing here will impact state store caches
>>>>>>>>> (Processor API) and perhaps also impact record caches (DSL), thus we
>>>>>>>>> should
>>>>>>>>> clarify any such impact here.
>>>>>>>>>
>>>>>>>>> Switching topics slightly.
>>>>>>>>>
>>>>>>>>> Jay wrote:
>>>>>>>>>
>>>>>>>>> One thing I've always found super important for this kind of design
>>>>>>>>> work
>>>>>>>>> is to do a really good job of cataloging the landscape of use cases
>>>>>>>>> and
>>>>>>>>> how prevalent each one is.
>>>>>>>>>
>>>>>>>>> +1 to this, as others have already said.
>>>>>>>>>
>>>>>>>>> Here, let me highlight -- just in case -- that when we talked about
>>>>>>>>> windowing use cases in the recent emails, the Processor API (where
>>>>>>>>> `punctuate` resides) does not have any notion of windowing at
>>>>>>>>> all.  If you
>>>>>>>>> want to do windowing *in the Processor API*, you must do so manually
>>>>>>>>> in
>>>>>>>>> combination with window stores.  For this reason I'd suggest to
>>>>>>>>> discuss use
>>>>>>>>> cases not just in general, but also in view of how you'd do so in the
>>>>>>>>> Processor API vs. in the DSL.  Right now, changing/improving
>>>>>>>>> `punctuate`
>>>>>>>>> does not impact the DSL at all, unless we add new functionality to
>>>>>>>>> it.
>>>>>>>>>
>>>>>>>>> Jay wrote in his strawman example:
>>>>>>>>>
>>>>>>>>> You aggregate click and impression data for a reddit like site.
>>>>>>>>> Every ten
>>>>>>>>> minutes you want to output a ranked list of the top 10 articles
>>>>>>>>> ranked by
>>>>>>>>> clicks/impressions for each geographical area. I want to be able
>>>>>>>>> run this
>>>>>>>>> in steady state as well as rerun to regenerate results (or catch up
>>>>>>>>> if it
>>>>>>>>> crashes).
>>>>>>>>>
>>>>>>>>> This is a good example for more than the obvious reason:  In KIP-63,
>>>>>>>>> we
>>>>>>>>> argued that the reason for saying "every ten minutes" above is not
>>>>>>>>> necessarily about because you want to output data *exactly* after ten
>>>>>>>>> minutes, but that you want to perform an aggregation based on 10-
>>>>>>>>> minute
>>>>>>>>> windows of input data; i.e., the point is about specifying the input
>>>>>>>>> for
>>>>>>>>> your aggregation, not or less about when the results of the
>>>>>>>>> aggregation
>>>>>>>>> should be send downstream.  To take an extreme example, you could
>>>>>>>>> disable
>>>>>>>>> record caches and let your app output a downstream update for every
>>>>>>>>> incoming input record.  If the last input record was from at minute 7
>>>>>>>>> of 10
>>>>>>>>> (for a 10-min window), then what your app would output at minute 10
>>>>>>>>> would
>>>>>>>>> be identical to what it had already emitted at minute 7 earlier
>>>>>>>>> anyways.
>>>>>>>>> This is particularly true when we take late-arriving data into
>>>>>>>>> account:  if
>>>>>>>>> a late record arrived at minute 13, your app would (by default) send
>>>>>>>>> a new
>>>>>>>>> update downstream, even though the "original" 10 minutes have already
>>>>>>>>> passed.
>>>>>>>>>
>>>>>>>>> Jay wrote...:
>>>>>>>>>
>>>>>>>>> There are a couple of tricky things that seem to make this hard
>>>>>>>>> with
>>>>>>>>>
>>>>>>>>> either
>>>>>>>>>
>>>>>>>>> of the options proposed:
>>>>>>>>> 1. If I emit this data using event time I have the problem
>>>>>>>>> described where
>>>>>>>>> a geographical region with no new clicks or impressions will fail
>>>>>>>>> to
>>>>>>>>>
>>>>>>>>> output
>>>>>>>>>
>>>>>>>>> results.
>>>>>>>>>
>>>>>>>>> ...and Arun Mathew wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> We window by the event time, but trigger punctuate in <punctuate
>>>>>>>>> interval>
>>>>>>>>> duration of system time, in the absence of an event crossing the
>>>>>>>>> punctuate
>>>>>>>>> event time.
>>>>>>>>>
>>>>>>>>> So, given what I wrote above about the status quo and what you can
>>>>>>>>> already
>>>>>>>>> do with it, is the concern that the state store cache doesn't give
>>>>>>>>> you
>>>>>>>>> *direct* control over "forcing an output after no later than X
>>>>>>>>> seconds [of
>>>>>>>>> processing-time]" but only indirect control through a cache
>>>>>>>>> size?  (Note
>>>>>>>>> that I am not dismissing the claims why this might be helpful.)
>>>>>>>>>
>>>>>>>>> Arun Mathew wrote:
>>>>>>>>>
>>>>>>>>> We are using Kafka Stream for our Audit Trail, where we need to
>>>>>>>>> output the
>>>>>>>>> event counts on each topic on each cluster aggregated over a 1
>>>>>>>>> minute
>>>>>>>>> window. We have to use event time to be able to cross check the
>>>>>>>>> counts.
>>>>>>>>>
>>>>>>>>> But
>>>>>>>>>
>>>>>>>>> we need to trigger punctuate [aggregate event pushes] by system
>>>>>>>>> time in
>>>>>>>>>
>>>>>>>>> the
>>>>>>>>>
>>>>>>>>> absence of events. Otherwise the event counts for unexpired windows
>>>>>>>>> would
>>>>>>>>> be 0 which is bad.
>>>>>>>>>
>>>>>>>>> Isn't the latter -- "count would be 0" -- the problem between the
>>>>>>>>> absence
>>>>>>>>> of output vs. an output of 0, similar to the use of `Option[T]` in
>>>>>>>>> Scala
>>>>>>>>> and the difference between `None` and `Some(0)`?  That is, isn't the
>>>>>>>>> root
>>>>>>>>> cause that the downstream system interprets the absence of output in
>>>>>>>>> a
>>>>>>>>> particular way ("No output after 1 minute = I consider the output to
>>>>>>>>> be
>>>>>>>>> 0.")?  Arguably, you could also adapt the downstream system (if
>>>>>>>>> possible)
>>>>>>>>> to correctly handle the difference between absence of output vs.
>>>>>>>>> output of
>>>>>>>>> 0.  I am not implying that we shouldn't care about such a use case,
>>>>>>>>> but
>>>>>>>>> want to understand the motivation better. :-)
>>>>>>>>>
>>>>>>>>> Also, to add some perspective, in some related discussions we talked
>>>>>>>>> about
>>>>>>>>> how a Kafka Streams application should not worry or not be coupled
>>>>>>>>> unnecessarily with such interpretation specifics in a downstream
>>>>>>>>> system's
>>>>>>>>> behavior.  After all, tomorrow your app's output might be consumed by
>>>>>>>>> more
>>>>>>>>> than just this one downstream system.  Arguably, Kafka Connect rather
>>>>>>>>> than
>>>>>>>>> Kafka Streams might be the best tool to link the universes of Kafka
>>>>>>>>> and
>>>>>>>>> downstream systems, including helping to reconcile the differences in
>>>>>>>>> how
>>>>>>>>> these systems interpret changes, updates, late-arriving data,
>>>>>>>>> etc.  Kafka
>>>>>>>>> Connect would allow you to decouple the Kafka Streams app's logical
>>>>>>>>> processing from the specifics of downstream systems, thanks to
>>>>>>>>> specific
>>>>>>>>> sink connectors (e.g. for Elastic, Cassandra, MySQL, S3, etc).  Would
>>>>>>>>> this
>>>>>>>>> decoupling with Kafka Connect help here?  (And if the answer is "Yes,
>>>>>>>>> but
>>>>>>>>> it's currently awkward to use Connect for this", this might be a
>>>>>>>>> problem we
>>>>>>>>> can solve, too.)
>>>>>>>>>
>>>>>>>>> Switching topics slightly again.
>>>>>>>>>
>>>>>>>>> Thomas wrote:
>>>>>>>>>
>>>>>>>>> I'm not entirely convinced that a separate callback (option C)
>>>>>>>>> is that messy (it could just be a default method with an empty
>>>>>>>>> implementation), but if we wanted a single API to handle both
>>>>>>>>> cases,
>>>>>>>>> how about something like the following?
>>>>>>>>>
>>>>>>>>> enum Time {
>>>>>>>>>    STREAM,
>>>>>>>>>    CLOCK
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> Yeah, I am on the fence here, too.  If we use the 1-method approach,
>>>>>>>>> then
>>>>>>>>> whatever the user is doing inside this method is a black box to Kafka
>>>>>>>>> Streams (similar to how we have no idea what the user does inside a
>>>>>>>>> `foreach` -- if the function passed to `foreach` writes to external
>>>>>>>>> systems, then Kafka Streams is totally unaware of the fact).  We
>>>>>>>>> won't
>>>>>>>>> know, for example, if the stream-time action has a smaller "trigger"
>>>>>>>>> frequency than the processing-time action.  Or, we won't know whether
>>>>>>>>> the
>>>>>>>>> user custom-codes a "not later than" trigger logic ("Do X every 1-
>>>>>>>>> minute of
>>>>>>>>> stream-time or 1-minute of processing-time, whichever comes
>>>>>>>>> first").  That
>>>>>>>>> said, I am not certain yet whether we would need such knowledge
>>>>>>>>> because,
>>>>>>>>> when using the Processor API, most of the work and decisions must be
>>>>>>>>> done
>>>>>>>>> by the user anyways.  It would matter though if the concept of
>>>>>>>>> "triggers"
>>>>>>>>> were to bubble up into the DSL because in the DSL the management of
>>>>>>>>> windowing, window stores, etc. must be done automatically by Kafka
>>>>>>>>> Streams.
>>>>>>>>>
>>>>>>>>> [In any case, btw, we have the corner case where the user configured
>>>>>>>>> the
>>>>>>>>> stream-time to be processing-time (e.g. via wall-clock timestamp
>>>>>>>>> extractor), at which point both punctuate variants are based on the
>>>>>>>>> same
>>>>>>>>> time semantics / timeline.]
>>>>>>>>>
>>>>>>>>> Again, I apologize for the wall of text.  Congratulations if you made
>>>>>>>>> it
>>>>>>>>> this far. :-)
>>>>>>>>>
>>>>>>>>> More than happy to hear your thoughts!
>>>>>>>>> Michael
>>>>>>>>>
>>>>>>>>> On Tue, Apr 11, 2017 at 1:12 AM, Arun Mathew <arunmathe...@gmail.com> 
>>>>>>>>> <arunmathe...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks Matthias.
>>>>>>>>> Sure, will correct it right away.
>>>>>>>>>
>>>>>>>>> On 11-Apr-2017 8:07 AM, "Matthias J. Sax" <matth...@confluent.io> 
>>>>>>>>> <matth...@confluent.io>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Thanks for preparing this page!
>>>>>>>>>
>>>>>>>>> About terminology:
>>>>>>>>>
>>>>>>>>> You introduce the term "event time" -- but we should call this
>>>>>>>>> "stream
>>>>>>>>> time" -- "stream time" is whatever TimestampExtractor returns and
>>>>>>>>> this
>>>>>>>>> could be event time, ingestion time, or processing/wall-clock time.
>>>>>>>>>
>>>>>>>>> Does this make sense to you?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -Matthias
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 4/10/17 4:58 AM, Arun Mathew wrote:
>>>>>>>>>
>>>>>>>>> Thanks Ewen.
>>>>>>>>>
>>>>>>>>> @Michal, @all, I have created a child page to start the Use Cases
>>>>>>>>>
>>>>>>>>> discussion [https://cwiki.apache.org/confluence/display/KAFKA/
>>>>>>>>> Punctuate+Use+Cases]. Please go through it and give your comments.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> @Tianji, Sorry for the delay. I am trying to make the patch
>>>>>>>>> public.
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Arun Mathew
>>>>>>>>>
>>>>>>>>> On 4/8/17, 02:00, "Ewen Cheslack-Postava" <e...@confluent.io> 
>>>>>>>>> <e...@confluent.io>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>     Arun,
>>>>>>>>>
>>>>>>>>>     I've given you permission to edit the wiki. Let me know if
>>>>>>>>> you run
>>>>>>>>>
>>>>>>>>> into any
>>>>>>>>>
>>>>>>>>>     issues.
>>>>>>>>>
>>>>>>>>>     -Ewen
>>>>>>>>>
>>>>>>>>>     On Fri, Apr 7, 2017 at 1:21 AM, Arun Mathew <amathew@yahoo-co 
>>>>>>>>> rp.jp> <amat...@yahoo-corp.jp>
>>>>>>>>>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>     > Thanks Michal. I don’t have the access yet [arunmathew88].
>>>>>>>>> Should I
>>>>>>>>>
>>>>>>>>> be
>>>>>>>>>
>>>>>>>>>     > sending a separate mail for this?
>>>>>>>>>     >
>>>>>>>>>     > I thought one of the person following this thread would be
>>>>>>>>> able to
>>>>>>>>>
>>>>>>>>> give me
>>>>>>>>>
>>>>>>>>>     > access.
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     > *From: *Michal Borowiecki <michal.borowie...@openbet.com> 
>>>>>>>>> <michal.borowie...@openbet.com>
>>>>>>>>>     > *Reply-To: *"dev@kafka.apache.org" <dev@kafka.apache.org> 
>>>>>>>>> <dev@kafka.apache.org> <dev@kafka.apache.org>
>>>>>>>>>     > *Date: *Friday, April 7, 2017 at 17:16
>>>>>>>>>     > *To: *"dev@kafka.apache.org" <dev@kafka.apache.org> 
>>>>>>>>> <dev@kafka.apache.org> <dev@kafka.apache.org>
>>>>>>>>>     > *Subject: *Re: [DISCUSS] KIP-138: Change punctuate
>>>>>>>>> semantics
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     > Hi Arun,
>>>>>>>>>     >
>>>>>>>>>     > I was thinking along the same lines as you, listing the use
>>>>>>>>> cases
>>>>>>>>>
>>>>>>>>> on the
>>>>>>>>>
>>>>>>>>>     > wiki, but didn't find time to get around doing that yet.
>>>>>>>>>     > Don't mind if you do it if you have access now.
>>>>>>>>>     > I was thinking it would be nice if, once we have the use
>>>>>>>>> cases
>>>>>>>>>
>>>>>>>>> listed,
>>>>>>>>>
>>>>>>>>>     > people could use likes to up-vote the use cases similar to
>>>>>>>>> what
>>>>>>>>>
>>>>>>>>> they're
>>>>>>>>>
>>>>>>>>>     > working on.
>>>>>>>>>     >
>>>>>>>>>     > I should have a bit more time to action this in the next
>>>>>>>>> few days,
>>>>>>>>>
>>>>>>>>> but
>>>>>>>>>
>>>>>>>>>     > happy for you to do it if you can beat me to it ;-)
>>>>>>>>>     >
>>>>>>>>>     > Cheers,
>>>>>>>>>     > Michal
>>>>>>>>>     >
>>>>>>>>>     > On 07/04/17 04:39, Arun Mathew wrote:
>>>>>>>>>     >
>>>>>>>>>     > Sure, Thanks Matthias. My id is [arunmathew88].
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     > Of course. I was thinking of a subpage where people can
>>>>>>>>>
>>>>>>>>> collaborate.
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     > Will do as per Michael’s suggestion.
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     > Regards,
>>>>>>>>>     >
>>>>>>>>>     > Arun Mathew
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     > On 4/7/17, 12:30, "Matthias J. Sax" <matth...@confluent.io> 
>>>>>>>>> <matth...@confluent.io>
>>>>>>>>> <
>>>>>>>>>
>>>>>>>>> matth...@confluent.io> wrote:
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >     Please share your Wiki-ID and a committer can give you
>>>>>>>>> write
>>>>>>>>>
>>>>>>>>> access.
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >     Btw: as you did not initiate the KIP, you should not
>>>>>>>>> change the
>>>>>>>>>
>>>>>>>>> KIP
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     without the permission of the original author -- in
>>>>>>>>> this case
>>>>>>>>>
>>>>>>>>> Michael.
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >     So you might also just share your thought over the
>>>>>>>>> mailing list
>>>>>>>>>
>>>>>>>>> and
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     Michael can update the KIP page. Or, as an alternative,
>>>>>>>>> just
>>>>>>>>>
>>>>>>>>> create a
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     subpage for the KIP page.
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >     @Michael: WDYT?
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >     -Matthias
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >     On 4/6/17 8:05 PM, Arun Mathew wrote:
>>>>>>>>>     >
>>>>>>>>>     >     > Hi Jay,
>>>>>>>>>     >
>>>>>>>>>     >     >           Thanks for the advise, I would like to list
>>>>>>>>> down
>>>>>>>>>
>>>>>>>>> the use cases as
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     > per your suggestion. But it seems I don't have write
>>>>>>>>>
>>>>>>>>> permission to the
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     > Apache Kafka Confluent Space. Whom shall I request
>>>>>>>>> for it?
>>>>>>>>>     >
>>>>>>>>>     >     >
>>>>>>>>>     >
>>>>>>>>>     >     > Regarding your last question. We are using a patch in
>>>>>>>>> our
>>>>>>>>>
>>>>>>>>> production system
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     > which does exactly this.
>>>>>>>>>     >
>>>>>>>>>     >     > We window by the event time, but trigger punctuate in
>>>>>>>>>
>>>>>>>>> <punctuate interval>
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     > duration of system time, in the absence of an event
>>>>>>>>> crossing
>>>>>>>>>
>>>>>>>>> the punctuate
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     > event time.
>>>>>>>>>     >
>>>>>>>>>     >     >
>>>>>>>>>     >
>>>>>>>>>     >     > We are using Kafka Stream for our Audit Trail, where
>>>>>>>>> we need
>>>>>>>>>
>>>>>>>>> to output the
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     > event counts on each topic on each cluster aggregated
>>>>>>>>> over a
>>>>>>>>>
>>>>>>>>> 1 minute
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     > window. We have to use event time to be able to cross
>>>>>>>>> check
>>>>>>>>>
>>>>>>>>> the counts. But
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     > we need to trigger punctuate [aggregate event pushes]
>>>>>>>>> by
>>>>>>>>>
>>>>>>>>> system time in the
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     > absence of events. Otherwise the event counts for
>>>>>>>>> unexpired
>>>>>>>>>
>>>>>>>>> windows would
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     > be 0 which is bad.
>>>>>>>>>     >
>>>>>>>>>     >     >
>>>>>>>>>     >
>>>>>>>>>     >     > "Maybe a hybrid solution works: I window by event
>>>>>>>>> time but
>>>>>>>>>
>>>>>>>>> trigger results
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     > by system time for windows that have updated? Not
>>>>>>>>> really sure
>>>>>>>>>
>>>>>>>>> the details
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     > of making that work. Does that work? Are there
>>>>>>>>> concrete
>>>>>>>>>
>>>>>>>>> examples where you
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     > actually want the current behavior?"
>>>>>>>>>     >
>>>>>>>>>     >     >
>>>>>>>>>     >
>>>>>>>>>     >     > --
>>>>>>>>>     >
>>>>>>>>>     >     > With Regards,
>>>>>>>>>     >
>>>>>>>>>     >     >
>>>>>>>>>     >
>>>>>>>>>     >     > Arun Mathew
>>>>>>>>>     >
>>>>>>>>>     >     > Yahoo! JAPAN Corporation
>>>>>>>>>     >
>>>>>>>>>     >     >
>>>>>>>>>     >
>>>>>>>>>     >     > On Wed, Apr 5, 2017 at 8:48 PM, Tianji Li <
>>>>>>>>>
>>>>>>>>> skyah...@gmail.com><skyah...@gmail.com> <skyah...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >
>>>>>>>>>     >
>>>>>>>>>     >     >> Hi Jay,
>>>>>>>>>     >
>>>>>>>>>     >     >>
>>>>>>>>>     >
>>>>>>>>>     >     >> The hybrid solution is exactly what I expect and
>>>>>>>>> need for
>>>>>>>>>
>>>>>>>>> our use cases
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> when dealing with telecom data.
>>>>>>>>>     >
>>>>>>>>>     >     >>
>>>>>>>>>     >
>>>>>>>>>     >     >> Thanks
>>>>>>>>>     >
>>>>>>>>>     >     >> Tianji
>>>>>>>>>     >
>>>>>>>>>     >     >>
>>>>>>>>>     >
>>>>>>>>>     >     >> On Wed, Apr 5, 2017 at 12:01 AM, Jay Kreps <
>>>>>>>>>
>>>>>>>>> j...@confluent.io><j...@confluent.io> <j...@confluent.io> wrote:
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>
>>>>>>>>>     >
>>>>>>>>>     >     >>> Hey guys,
>>>>>>>>>     >
>>>>>>>>>     >     >>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> One thing I've always found super important for
>>>>>>>>> this kind
>>>>>>>>>
>>>>>>>>> of design work
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> is
>>>>>>>>>     >
>>>>>>>>>     >     >>> to do a really good job of cataloging the landscape
>>>>>>>>> of use
>>>>>>>>>
>>>>>>>>> cases and how
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> prevalent each one is. By that I mean not just
>>>>>>>>> listing lots
>>>>>>>>>
>>>>>>>>> of uses, but
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> also grouping them into categories that
>>>>>>>>> functionally need
>>>>>>>>>
>>>>>>>>> the same thing.
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> In the absence of this it is very hard to reason
>>>>>>>>> about
>>>>>>>>>
>>>>>>>>> design proposals.
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> From the proposals so far I think we have a lot of
>>>>>>>>>
>>>>>>>>> discussion around
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> possible apis, but less around what the user needs
>>>>>>>>> for
>>>>>>>>>
>>>>>>>>> different use
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> cases
>>>>>>>>>     >
>>>>>>>>>     >     >>> and how they would implement that using the api.
>>>>>>>>>     >
>>>>>>>>>     >     >>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> Here is an example:
>>>>>>>>>     >
>>>>>>>>>     >     >>> You aggregate click and impression data for a
>>>>>>>>> reddit like
>>>>>>>>>
>>>>>>>>> site. Every ten
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> minutes you want to output a ranked list of the top
>>>>>>>>> 10
>>>>>>>>>
>>>>>>>>> articles ranked by
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> clicks/impressions for each geographical area. I
>>>>>>>>> want to be
>>>>>>>>>
>>>>>>>>> able run this
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> in steady state as well as rerun to regenerate
>>>>>>>>> results (or
>>>>>>>>>
>>>>>>>>> catch up if it
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> crashes).
>>>>>>>>>     >
>>>>>>>>>     >     >>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> There are a couple of tricky things that seem to
>>>>>>>>> make this
>>>>>>>>>
>>>>>>>>> hard with
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> either
>>>>>>>>>     >
>>>>>>>>>     >     >>> of the options proposed:
>>>>>>>>>     >
>>>>>>>>>     >     >>> 1. If I emit this data using event time I have the
>>>>>>>>> problem
>>>>>>>>>
>>>>>>>>> described
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> where
>>>>>>>>>     >
>>>>>>>>>     >     >>> a geographical region with no new clicks or
>>>>>>>>> impressions
>>>>>>>>>
>>>>>>>>> will fail to
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> output
>>>>>>>>>     >
>>>>>>>>>     >     >>> results.
>>>>>>>>>     >
>>>>>>>>>     >     >>> 2. If I emit this data using system time I have the
>>>>>>>>> problem
>>>>>>>>>
>>>>>>>>> that when
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> reprocessing data my window may not be ten minutes
>>>>>>>>> but 10
>>>>>>>>>
>>>>>>>>> hours if my
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> processing is very fast so it dramatically changes
>>>>>>>>> the
>>>>>>>>>
>>>>>>>>> output.
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> Maybe a hybrid solution works: I window by event
>>>>>>>>> time but
>>>>>>>>>
>>>>>>>>> trigger results
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> by system time for windows that have updated? Not
>>>>>>>>> really
>>>>>>>>>
>>>>>>>>> sure the details
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> of making that work. Does that work? Are there
>>>>>>>>> concrete
>>>>>>>>>
>>>>>>>>> examples where
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> you
>>>>>>>>>     >
>>>>>>>>>     >     >>> actually want the current behavior?
>>>>>>>>>     >
>>>>>>>>>     >     >>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> -Jay
>>>>>>>>>     >
>>>>>>>>>     >     >>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> On Tue, Apr 4, 2017 at 8:32 PM, Arun Mathew <
>>>>>>>>>
>>>>>>>>> arunmathe...@gmail.com> <arunmathe...@gmail.com> 
>>>>>>>>> <arunmathe...@gmail.com>
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> wrote:
>>>>>>>>>     >
>>>>>>>>>     >     >>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>> Hi All,
>>>>>>>>>     >
>>>>>>>>>     >     >>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>> Thanks for the KIP. We were also in need of a
>>>>>>>>> mechanism to
>>>>>>>>>
>>>>>>>>> trigger
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>> punctuate in the absence of events.
>>>>>>>>>     >
>>>>>>>>>     >     >>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>> As I described in [
>>>>>>>>>     >
>>>>>>>>>     >     >>>> https://issues.apache.org/jira/browse/KAFKA-3514?
>>>>>>>>>     >
>>>>>>>>>     >     >>>> focusedCommentId=15926036&page=com.atlassian.jira.
>>>>>>>>>     >
>>>>>>>>>     >     >>>> plugin.system.issuetabpanels:comment-
>>>>>>>>> tabpanel#comment-
>>>>>>>>>
>>>>>>>>> 15926036
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>> ],
>>>>>>>>>     >
>>>>>>>>>     >     >>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    - Our approached involved using the event time
>>>>>>>>> by
>>>>>>>>>
>>>>>>>>> default.
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    - The method to check if there is any punctuate
>>>>>>>>> ready
>>>>>>>>>
>>>>>>>>> in the
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    PunctuationQueue is triggered via the any event
>>>>>>>>>
>>>>>>>>> received by the
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> stream
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    tread, or at the polling intervals in the
>>>>>>>>> absence of
>>>>>>>>>
>>>>>>>>> any events.
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    - When we create Punctuate objects (which
>>>>>>>>> contains the
>>>>>>>>>
>>>>>>>>> next event
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> time
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    for punctuation and interval), we also record
>>>>>>>>> the
>>>>>>>>>
>>>>>>>>> creation time
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> (system
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    time).
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    - While checking for maturity of Punctuate
>>>>>>>>> Schedule by
>>>>>>>>>     >
>>>>>>>>>     >     >> mayBePunctuate
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    method, we also check if the system clock has
>>>>>>>>> elapsed
>>>>>>>>>
>>>>>>>>> the punctuate
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    interval since the schedule creation time.
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    - In the absence of any event, or in the
>>>>>>>>> absence of any
>>>>>>>>>
>>>>>>>>> event for
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> one
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    topic in the partition group assigned to the
>>>>>>>>> stream
>>>>>>>>>
>>>>>>>>> task, the system
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>> time
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    will elapse the interval and we trigger a
>>>>>>>>> punctuate
>>>>>>>>>
>>>>>>>>> using the
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> expected
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    punctuation event time.
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    - we then create the next punctuation schedule
>>>>>>>>> as
>>>>>>>>>
>>>>>>>>> punctuation event
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> time
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    + punctuation interval, [again recording the
>>>>>>>>> system
>>>>>>>>>
>>>>>>>>> time of creation
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> of
>>>>>>>>>     >
>>>>>>>>>     >     >>>> the
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    schedule].
>>>>>>>>>     >
>>>>>>>>>     >     >>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>> We call this a Hybrid Punctuate. Of course, this
>>>>>>>>> approach
>>>>>>>>>
>>>>>>>>> has pros and
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>> cons.
>>>>>>>>>     >
>>>>>>>>>     >     >>>> Pros
>>>>>>>>>     >
>>>>>>>>>     >     >>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    - Punctuates will happen in <punctuate
>>>>>>>>> interval> time
>>>>>>>>>
>>>>>>>>> duration at
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> max
>>>>>>>>>     >
>>>>>>>>>     >     >>> in
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    terms of system time.
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    - The semantics as a whole continues to revolve
>>>>>>>>> around
>>>>>>>>>
>>>>>>>>> event time.
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    - We can use the old data [old timestamps] to
>>>>>>>>> rerun any
>>>>>>>>>
>>>>>>>>> experiments
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> or
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    tests.
>>>>>>>>>     >
>>>>>>>>>     >     >>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>> Cons
>>>>>>>>>     >
>>>>>>>>>     >     >>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    - In case the  <punctuate interval> is not a
>>>>>>>>> time
>>>>>>>>>
>>>>>>>>> duration [say
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> logical
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    time/event count], then the approach might not
>>>>>>>>> be
>>>>>>>>>
>>>>>>>>> meaningful.
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    - In case there is a case where we have to wait
>>>>>>>>> for an
>>>>>>>>>
>>>>>>>>> actual event
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> from
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    a low event rate partition in the partition
>>>>>>>>> group, this
>>>>>>>>>
>>>>>>>>> approach
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> will
>>>>>>>>>     >
>>>>>>>>>     >     >>>> jump
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    the gun.
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    - in case the event processing cannot catch up
>>>>>>>>> with the
>>>>>>>>>
>>>>>>>>> event rate
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> and
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    the expected timestamp events gets queued for
>>>>>>>>> long
>>>>>>>>>
>>>>>>>>> time, this
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> approach
>>>>>>>>>     >
>>>>>>>>>     >     >>>>    might jump the gun.
>>>>>>>>>     >
>>>>>>>>>     >     >>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>> I believe the above approach and discussion goes
>>>>>>>>> close to
>>>>>>>>>
>>>>>>>>> the approach
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> A.
>>>>>>>>>     >
>>>>>>>>>     >     >>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>> -----------
>>>>>>>>>     >
>>>>>>>>>     >     >>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>> I like the idea of having an even count based
>>>>>>>>> punctuate.
>>>>>>>>>     >
>>>>>>>>>     >     >>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>> -----------
>>>>>>>>>     >
>>>>>>>>>     >     >>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>> I agree with the discussion around approach C,
>>>>>>>>> that we
>>>>>>>>>
>>>>>>>>> should provide
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> the
>>>>>>>>>     >
>>>>>>>>>     >     >>>> user with the option to choose system time or
>>>>>>>>> event time
>>>>>>>>>
>>>>>>>>> based
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> punctuates.
>>>>>>>>>     >
>>>>>>>>>     >     >>>> But I believe that the user predominantly wants to
>>>>>>>>> use
>>>>>>>>>
>>>>>>>>> event time while
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> not
>>>>>>>>>     >
>>>>>>>>>     >     >>>> missing out on regular punctuates due to event
>>>>>>>>> delays or
>>>>>>>>>
>>>>>>>>> event
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> absences.
>>>>>>>>>     >
>>>>>>>>>     >     >>>> Hence a complex punctuate option as Matthias
>>>>>>>>> mentioned
>>>>>>>>>
>>>>>>>>> (quoted below)
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> would
>>>>>>>>>     >
>>>>>>>>>     >     >>>> be most apt.
>>>>>>>>>     >
>>>>>>>>>     >     >>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>> "- We might want to add "complex" schedules later
>>>>>>>>> on
>>>>>>>>>
>>>>>>>>> (like, punctuate
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> on
>>>>>>>>>     >
>>>>>>>>>     >     >>>> every 10 seconds event-time or 60 seconds system-
>>>>>>>>> time
>>>>>>>>>
>>>>>>>>> whatever comes
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>> first)."
>>>>>>>>>     >
>>>>>>>>>     >     >>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>> -----------
>>>>>>>>>     >
>>>>>>>>>     >     >>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>> I think I read somewhere that Kafka Streams
>>>>>>>>> started with
>>>>>>>>>
>>>>>>>>> System Time as
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> the
>>>>>>>>>     >
>>>>>>>>>     >     >>>> punctuation standard, but was later changed to
>>>>>>>>> Event Time.
>>>>>>>>>
>>>>>>>>> I guess
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> there
>>>>>>>>>     >
>>>>>>>>>     >     >>>> would be some good reason behind it. As Kafka
>>>>>>>>> Streams want
>>>>>>>>>
>>>>>>>>> to evolve
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> more
>>>>>>>>>     >
>>>>>>>>>     >     >>>> on the Stream Processing front, I believe the
>>>>>>>>> emphasis on
>>>>>>>>>
>>>>>>>>> event time
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> would
>>>>>>>>>     >
>>>>>>>>>     >     >>>> remain quite strong.
>>>>>>>>>     >
>>>>>>>>>     >     >>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>> With Regards,
>>>>>>>>>     >
>>>>>>>>>     >     >>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>> Arun Mathew
>>>>>>>>>     >
>>>>>>>>>     >     >>>> Yahoo! JAPAN Corporation, Tokyo
>>>>>>>>>     >
>>>>>>>>>     >     >>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>> On Wed, Apr 5, 2017 at 3:53 AM, Thomas Becker <
>>>>>>>>>
>>>>>>>>> tobec...@tivo.com> <tobec...@tivo.com> <tobec...@tivo.com>
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> wrote:
>>>>>>>>>     >
>>>>>>>>>     >     >>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>> Yeah I like PuncutationType much better; I just
>>>>>>>>> threw
>>>>>>>>>
>>>>>>>>> Time out there
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>> more as a strawman than an actual suggestion ;) I
>>>>>>>>> still
>>>>>>>>>
>>>>>>>>> think it's
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>> worth considering what this buys us over an
>>>>>>>>> additional
>>>>>>>>>
>>>>>>>>> callback. I
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>> foresee a number of punctuate implementations
>>>>>>>>> following
>>>>>>>>>
>>>>>>>>> this pattern:
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>> public void punctuate(PunctuationType type) {
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>     switch (type) {
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>         case EVENT_TIME:
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>             methodA();
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>             break;
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>         case SYSTEM_TIME:
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>             methodB();
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>             break;
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>     }
>>>>>>>>>     >
>>>>>>>>>     >     >>>>> }
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>> I guess one advantage of this approach is we
>>>>>>>>> could add
>>>>>>>>>
>>>>>>>>> additional
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>> punctuation types later in a backwards compatible
>>>>>>>>> way
>>>>>>>>>
>>>>>>>>> (like event
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> count
>>>>>>>>>     >
>>>>>>>>>     >     >>>>> as you mentioned).
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>> -Tommy
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>> On Tue, 2017-04-04 at 11:10 -0700, Matthias J.
>>>>>>>>> Sax wrote:
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>> That sounds promising.
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>> I am just wondering if `Time` is the best name.
>>>>>>>>> Maybe we
>>>>>>>>>
>>>>>>>>> want to
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> add
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>> other non-time based punctuations at some point
>>>>>>>>> later. I
>>>>>>>>>
>>>>>>>>> would
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>> suggest
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>> enum PunctuationType {
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>   EVENT_TIME,
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>   SYSTEM_TIME,
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>> }
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>> or similar. Just to keep the door open -- it's
>>>>>>>>> easier to
>>>>>>>>>
>>>>>>>>> add new
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>> stuff
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>> if the name is more generic.
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>> -Matthias
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>> On 4/4/17 5:30 AM, Thomas Becker wrote:
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> I agree that the framework providing and
>>>>>>>>> managing the
>>>>>>>>>
>>>>>>>>> notion of
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> stream
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> time is valuable and not something we would
>>>>>>>>> want to
>>>>>>>>>
>>>>>>>>> delegate to
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> the
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> tasks. I'm not entirely convinced that a
>>>>>>>>> separate
>>>>>>>>>
>>>>>>>>> callback
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> (option
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> C)
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> is that messy (it could just be a default
>>>>>>>>> method with
>>>>>>>>>
>>>>>>>>> an empty
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> implementation), but if we wanted a single API
>>>>>>>>> to
>>>>>>>>>
>>>>>>>>> handle both
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> cases,
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> how about something like the following?
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> enum Time {
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>    STREAM,
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>    CLOCK
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> }
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> Then on ProcessorContext:
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> context.schedule(Time time, long interval)  //
>>>>>>>>> We could
>>>>>>>>>
>>>>>>>>> allow
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> this
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> to
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> be called once for each value of time to mix
>>>>>>>>>
>>>>>>>>> approaches.
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> Then the Processor API becomes:
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> punctuate(Time time) // time here denotes which
>>>>>>>>>
>>>>>>>>> schedule resulted
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> in
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> this call.
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> Thoughts?
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> On Mon, 2017-04-03 at 22:44 -0700, Matthias J.
>>>>>>>>> Sax
>>>>>>>>>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> Thanks a lot for the KIP Michal,
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> I was thinking about the four options you
>>>>>>>>> proposed in
>>>>>>>>>
>>>>>>>>> more
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> details
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> and
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> this are my thoughts:
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> (A) You argue, that users can still
>>>>>>>>> "punctuate" on
>>>>>>>>>
>>>>>>>>> event-time
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> via
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> process(), but I am not sure if this is
>>>>>>>>> possible.
>>>>>>>>>
>>>>>>>>> Note, that
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> users
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> only
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> get record timestamps via context.timestamp().
>>>>>>>>> Thus,
>>>>>>>>>
>>>>>>>>> users
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> would
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> need
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> to
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> track the time progress per partition (based
>>>>>>>>> on the
>>>>>>>>>
>>>>>>>>> partitions
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> they
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> obverse via context.partition(). (This alone
>>>>>>>>> puts a
>>>>>>>>>
>>>>>>>>> huge burden
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> on
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> the
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> user by itself.) However, users are not
>>>>>>>>> notified at
>>>>>>>>>
>>>>>>>>> startup
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> what
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> partitions are assigned, and user are not
>>>>>>>>> notified
>>>>>>>>>
>>>>>>>>> when
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> partitions
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> get
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> revoked. Because this information is not
>>>>>>>>> available,
>>>>>>>>>
>>>>>>>>> it's not
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> possible
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> to
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> "manually advance" stream-time, and thus
>>>>>>>>> event-time
>>>>>>>>>
>>>>>>>>> punctuation
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> within
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> process() seems not to be possible -- or do
>>>>>>>>> you see a
>>>>>>>>>
>>>>>>>>> way to
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> get
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> it
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> done? And even if, it might still be too
>>>>>>>>> clumsy to
>>>>>>>>>
>>>>>>>>> use.
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> (B) This does not allow to mix both
>>>>>>>>> approaches, thus
>>>>>>>>>
>>>>>>>>> limiting
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> what
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> users
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> can do.
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> (C) This should give all flexibility we need.
>>>>>>>>> However,
>>>>>>>>>
>>>>>>>>> just
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> adding
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> one
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> more method seems to be a solution that is too
>>>>>>>>> simple
>>>>>>>>>
>>>>>>>>> (cf my
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> comments
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> below).
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> (D) This might be hard to use. Also, I am not
>>>>>>>>> sure how
>>>>>>>>>
>>>>>>>>> a user
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> could
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> enable system-time and event-time punctuation
>>>>>>>>> in
>>>>>>>>>
>>>>>>>>> parallel.
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> Overall options (C) seems to be the most
>>>>>>>>> promising
>>>>>>>>>
>>>>>>>>> approach to
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> me.
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> Because I also favor a clean API, we might
>>>>>>>>> keep
>>>>>>>>>
>>>>>>>>> current
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> punctuate()
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> as-is, but deprecate it -- so we can remove it
>>>>>>>>> at some
>>>>>>>>>
>>>>>>>>> later
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> point
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> when
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> people use the "new punctuate API".
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> Couple of follow up questions:
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> - I am wondering, if we should have two
>>>>>>>>> callback
>>>>>>>>>
>>>>>>>>> methods or
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> just
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> one
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> (ie, a unified for system and event time
>>>>>>>>> punctuation
>>>>>>>>>
>>>>>>>>> or one for
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> each?).
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> - If we have one, how can the user figure out,
>>>>>>>>> which
>>>>>>>>>
>>>>>>>>> condition
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> did
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> trigger?
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> - How would the API look like, for registering
>>>>>>>>>
>>>>>>>>> different
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> punctuate
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> schedules? The "type" must be somehow defined?
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> - We might want to add "complex" schedules
>>>>>>>>> later on
>>>>>>>>>
>>>>>>>>> (like,
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> punctuate
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> on
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> every 10 seconds event-time or 60 seconds
>>>>>>>>> system-time
>>>>>>>>>
>>>>>>>>> whatever
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> comes
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> first). I don't say we should add this right
>>>>>>>>> away, but
>>>>>>>>>
>>>>>>>>> we might
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> want
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> to
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> define the API in a way, that it allows
>>>>>>>>> extensions
>>>>>>>>>
>>>>>>>>> like this
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> later
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> on,
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> without redesigning the API (ie, the API
>>>>>>>>> should be
>>>>>>>>>
>>>>>>>>> designed
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> extensible)
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> - Did you ever consider count-based
>>>>>>>>> punctuation?
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> I understand, that you would like to solve a
>>>>>>>>> simple
>>>>>>>>>
>>>>>>>>> problem,
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> but
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> we
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> learned from the past, that just "adding some
>>>>>>>>> API"
>>>>>>>>>
>>>>>>>>> quickly
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> leads
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> to a
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> not very well defined API that needs time
>>>>>>>>> consuming
>>>>>>>>>
>>>>>>>>> clean up
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> later on
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> via other KIPs. Thus, I would prefer to get a
>>>>>>>>> holistic
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> punctuation
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> KIP
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> with this from the beginning on to avoid later
>>>>>>>>> painful
>>>>>>>>>     >
>>>>>>>>>     >     >> redesign.
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> -Matthias
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>> On 4/3/17 11:58 AM, Michal Borowiecki wrote:
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> Thanks Thomas,
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> I'm also wary of changing the existing
>>>>>>>>> semantics of
>>>>>>>>>     >
>>>>>>>>>     >     >> punctuate,
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> for
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> backward compatibility reasons, although I
>>>>>>>>> like the
>>>>>>>>>     >
>>>>>>>>>     >     >> conceptual
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> simplicity of that option.
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> Adding a new method to me feels safer but, in
>>>>>>>>> a way,
>>>>>>>>>
>>>>>>>>> uglier.
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> I
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> added
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> this to the KIP now as option (C).
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> The TimestampExtractor mechanism is actually
>>>>>>>>> more
>>>>>>>>>
>>>>>>>>> flexible,
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> as
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> it
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> allows
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> you to return any value, you're not limited
>>>>>>>>> to event
>>>>>>>>>
>>>>>>>>> time or
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> system
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> time
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> (although I don't see an actual use case
>>>>>>>>> where you
>>>>>>>>>
>>>>>>>>> might need
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> anything
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> else then those two). Hence I also proposed
>>>>>>>>> the
>>>>>>>>>
>>>>>>>>> option to
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> allow
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> users
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> to, effectively, decide what "stream time" is
>>>>>>>>> for
>>>>>>>>>
>>>>>>>>> them given
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> the
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> presence or absence of messages, much like
>>>>>>>>> they can
>>>>>>>>>
>>>>>>>>> decide
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> what
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> msg
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> time
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> means for them using the TimestampExtractor.
>>>>>>>>> What do
>>>>>>>>>
>>>>>>>>> you
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> think
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> about
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> that? This is probably most flexible but also
>>>>>>>>> most
>>>>>>>>>     >
>>>>>>>>>     >     >> complicated.
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> All comments appreciated.
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> Cheers,
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> Michal
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>> On 03/04/17 19:23, Thomas Becker wrote:
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>> Although I fully agree we need a way to
>>>>>>>>> trigger
>>>>>>>>>
>>>>>>>>> periodic
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>> processing
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>> that is independent from whether and when
>>>>>>>>> messages
>>>>>>>>>
>>>>>>>>> arrive,
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>> I'm
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>> not sure
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>> I like the idea of changing the existing
>>>>>>>>> semantics
>>>>>>>>>
>>>>>>>>> across
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> the
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>> board.
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>> What if we added an additional callback to
>>>>>>>>> Processor
>>>>>>>>>
>>>>>>>>> that
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> can
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>> be
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>> scheduled similarly to punctuate() but was
>>>>>>>>> always
>>>>>>>>>
>>>>>>>>> called at
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>> fixed, wall
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>> clock based intervals? This way you wouldn't
>>>>>>>>> have to
>>>>>>>>>
>>>>>>>>> give
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> up
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>> the
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>> notion
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>> of stream time to be able to do periodic
>>>>>>>>> processing.
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>> On Mon, 2017-04-03 at 10:34 +0100, Michal
>>>>>>>>> Borowiecki
>>>>>>>>>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>>> Hi all,
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>>> I have created a draft for KIP-138: Change
>>>>>>>>>
>>>>>>>>> punctuate
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>>> semantics
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>>> <https://cwiki.apache.org/
>>>>>>>>>
>>>>>>>>> confluence/display/KAFKA/KIP- <https://cwiki.apache.org/ 
>>>>>>>>> confluence/display/KAFKA/KIP-> 
>>>>>>>>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP->
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     > <https://cwiki.apache.org/confluence/display/KAFKA/KI P-> 
>>>>>>>>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP->>>
>>>>>>>>>
>>>>>>>>> 138%
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>>> 3A+C
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>>> hange+
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>>> punctuate+semantics>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>>> .
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>>> Appreciating there can be different views
>>>>>>>>> on
>>>>>>>>>
>>>>>>>>> system-time
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> vs
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>>> event-
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>>> time
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>>> semantics for punctuation depending on use-
>>>>>>>>> case and
>>>>>>>>>
>>>>>>>>> the
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>>> importance of
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>>> backwards compatibility of any such change,
>>>>>>>>> I've
>>>>>>>>>
>>>>>>>>> left it
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>>> quite
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>>> open
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>>> and
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>>> hope to fill in more info as the discussion
>>>>>>>>>
>>>>>>>>> progresses.
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>>> Thanks,
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>>>>> Michal
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> --
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>     Tommy Becker
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>     Senior Software Engineer
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>     O +1 919.460.4747 <%28919%29%20460-4747> 
>>>>>>>>> <(919)%20460-4747>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>     tivo.com
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> ________________________________
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> This email and any attachments may contain
>>>>>>>>> confidential
>>>>>>>>>
>>>>>>>>> and
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> privileged material for the sole use of the
>>>>>>>>> intended
>>>>>>>>>
>>>>>>>>> recipient.
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> Any
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> review, copying, or distribution of this email
>>>>>>>>> (or any
>>>>>>>>>     >
>>>>>>>>>     >     >> attachments)
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> by others is prohibited. If you are not the
>>>>>>>>> intended
>>>>>>>>>
>>>>>>>>> recipient,
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> please contact the sender immediately and
>>>>>>>>> permanently
>>>>>>>>>
>>>>>>>>> delete this
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> email and any attachments. No employee or agent
>>>>>>>>> of TiVo
>>>>>>>>>
>>>>>>>>> Inc. is
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> authorized to conclude any binding agreement on
>>>>>>>>> behalf
>>>>>>>>>
>>>>>>>>> of TiVo
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> Inc.
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> by email. Binding agreements with TiVo Inc. may
>>>>>>>>> only be
>>>>>>>>>
>>>>>>>>> made by a
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>> signed written agreement.
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>> --
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>     Tommy Becker
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>     Senior Software Engineer
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>     O +1 919.460.4747 <%28919%29%20460-4747> 
>>>>>>>>> <(919)%20460-4747>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>     tivo.com
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>> ________________________________
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>> This email and any attachments may contain
>>>>>>>>> confidential
>>>>>>>>>
>>>>>>>>> and
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> privileged
>>>>>>>>>     >
>>>>>>>>>     >     >>>>> material for the sole use of the intended
>>>>>>>>> recipient. Any
>>>>>>>>>
>>>>>>>>> review,
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>> copying,
>>>>>>>>>     >
>>>>>>>>>     >     >>>>> or distribution of this email (or any
>>>>>>>>> attachments) by
>>>>>>>>>
>>>>>>>>> others is
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>> prohibited.
>>>>>>>>>     >
>>>>>>>>>     >     >>>>> If you are not the intended recipient, please
>>>>>>>>> contact the
>>>>>>>>>
>>>>>>>>> sender
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>> immediately and permanently delete this email and
>>>>>>>>> any
>>>>>>>>>
>>>>>>>>> attachments. No
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>> employee or agent of TiVo Inc. is authorized to
>>>>>>>>> conclude
>>>>>>>>>
>>>>>>>>> any binding
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>> agreement on behalf of TiVo Inc. by email.
>>>>>>>>> Binding
>>>>>>>>>
>>>>>>>>> agreements with
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >> TiVo
>>>>>>>>>     >
>>>>>>>>>     >     >>>>> Inc. may only be made by a signed written
>>>>>>>>> agreement.
>>>>>>>>>     >
>>>>>>>>>     >     >>>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>>
>>>>>>>>>     >
>>>>>>>>>     >     >>>
>>>>>>>>>     >
>>>>>>>>>     >     >>
>>>>>>>>>     >
>>>>>>>>>     >     >
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     > --
>>>>>>>>>     >
>>>>>>>>>     > <http://www.openbet.com/> <http://www.openbet.com/>
>>>>>>>>>
>>>>>>>>>     >
>>>>>>>>>     > *Michal Borowiecki*
>>>>>>>>>     >
>>>>>>>>>     > *Senior Software Engineer L4*
>>>>>>>>>     >
>>>>>>>>>     > *T: *
>>>>>>>>>     >
>>>>>>>>>     > +44 208 742 1600 <+44%2020%208742%201600> 
>>>>>>>>> <+44%2020%208742%201600>
>>>>>>>>>     >
>>>>>>>>>     > +44 203 249 8448 <+44%2020%203249%208448> 
>>>>>>>>> <+44%2020%203249%208448>
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     > *E: *
>>>>>>>>>     >
>>>>>>>>>     > michal.borowie...@openbet.com
>>>>>>>>>     >
>>>>>>>>>     > *W: *
>>>>>>>>>     >
>>>>>>>>>     > www.openbet.com
>>>>>>>>>     >
>>>>>>>>>     > *OpenBet Ltd*
>>>>>>>>>     >
>>>>>>>>>     > Chiswick Park Building 9
>>>>>>>>>     >
>>>>>>>>>     > 566 Chiswick High Rd
>>>>>>>>>     >
>>>>>>>>>     > London
>>>>>>>>>     >
>>>>>>>>>     > W4 5XT
>>>>>>>>>     >
>>>>>>>>>     > UK
>>>>>>>>>     >
>>>>>>>>>     > <https://www.openbet.com/email_promo> 
>>>>>>>>> <https://www.openbet.com/email_promo>
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     > This message is confidential and intended only for the
>>>>>>>>> addressee.
>>>>>>>>>
>>>>>>>>> If you
>>>>>>>>>
>>>>>>>>>     > have received this message in error, please immediately
>>>>>>>>> notify the
>>>>>>>>>     > postmas...@openbet.com and delete it from your system as
>>>>>>>>> well as
>>>>>>>>>
>>>>>>>>> any
>>>>>>>>>
>>>>>>>>>     > copies. The content of e-mails as well as traffic data may
>>>>>>>>> be
>>>>>>>>>
>>>>>>>>> monitored by
>>>>>>>>>
>>>>>>>>>     > OpenBet for employment and security purposes. To protect
>>>>>>>>> the
>>>>>>>>>
>>>>>>>>> environment
>>>>>>>>>
>>>>>>>>>     > please do not print this e-mail unless necessary. OpenBet
>>>>>>>>> Ltd.
>>>>>>>>>
>>>>>>>>> Registered
>>>>>>>>>
>>>>>>>>>     > Office: Chiswick Park Building 9, 566 Chiswick High Road,
>>>>>>>>> London,
>>>>>>>>>
>>>>>>>>> W4 5XT,
>>>>>>>>>
>>>>>>>>>     > United Kingdom. A company registered in England and Wales.
>>>>>>>>>
>>>>>>>>> Registered no.
>>>>>>>>>
>>>>>>>>>     > 3134634. VAT no. GB927523612
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>     >
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>     Tommy Becker
>>>>>>>>>
>>>>>>>>>     Senior Software Engineer
>>>>>>>>>
>>>>>>>>>     O +1 919.460.4747 <%28919%29%20460-4747>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>     tivo.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ________________________________
>>>>>>>>>
>>>>>>>>> This email and any attachments may contain confidential and 
>>>>>>>>> privileged material for the sole use of the intended recipient. Any 
>>>>>>>>> review, copying, or distribution of this email (or any attachments) 
>>>>>>>>> by others is prohibited. If you are not the intended recipient, 
>>>>>>>>> please contact the sender immediately and permanently delete this 
>>>>>>>>> email and any attachments. No employee or agent of TiVo Inc. is 
>>>>>>>>> authorized to conclude any binding agreement on behalf of TiVo Inc. 
>>>>>>>>> by email. Binding agreements with TiVo Inc. may only be made by a 
>>>>>>>>> signed written agreement.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> <http://www.openbet.com/> Michal Borowiecki
>>>>>>>>> Senior Software Engineer L4
>>>>>>>>> T: +44 208 742 1600 <+44%2020%208742%201600>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>> -- 
>>>>>> Signature
>>>>>> <http://www.openbet.com/>        Michal Borowiecki
>>>>>> Senior Software Engineer L4
>>>>>>  T:      +44 208 742 1600
>>>>>>
>>>>>>  
>>>>>>  +44 203 249 8448
>>>>>>
>>>>>>  
>>>>>>   
>>>>>>  E:      michal.borowie...@openbet.com
>>>>>>  W:      www.openbet.com <http://www.openbet.com/>
>>>>>>
>>>>>>  
>>>>>>  OpenBet Ltd
>>>>>>
>>>>>>  Chiswick Park Building 9
>>>>>>
>>>>>>  566 Chiswick High Rd
>>>>>>
>>>>>>  London
>>>>>>
>>>>>>  W4 5XT
>>>>>>
>>>>>>  UK
>>>>>>
>>>>>>  
>>>>>> <https://www.openbet.com/email_promo>
>>>>>>
>>>>>> This message is confidential and intended only for the addressee. If
>>>>>> you have received this message in error, please immediately notify the
>>>>>> postmas...@openbet.com <mailto:postmas...@openbet.com> and delete it
>>>>>> from your system as well as any copies. The content of e-mails as well
>>>>>> as traffic data may be monitored by OpenBet for employment and
>>>>>> security purposes. To protect the environment please do not print this
>>>>>> e-mail unless necessary. OpenBet Ltd. Registered Office: Chiswick Park
>>>>>> Building 9, 566 Chiswick High Road, London, W4 5XT, United Kingdom. A
>>>>>> company registered in England and Wales. Registered no. 3134634. VAT
>>>>>> no. GB927523612
>>>>>>
>>>>> -- 
>>>>> Signature
>>>>> <http://www.openbet.com/>         Michal Borowiecki
>>>>> Senior Software Engineer L4
>>>>>   T:      +44 208 742 1600
>>>>>
>>>>>   
>>>>>   +44 203 249 8448
>>>>>
>>>>>   
>>>>>    
>>>>>   E:      michal.borowie...@openbet.com
>>>>>   W:      www.openbet.com <http://www.openbet.com/>
>>>>>
>>>>>   
>>>>>   OpenBet Ltd
>>>>>
>>>>>   Chiswick Park Building 9
>>>>>
>>>>>   566 Chiswick High Rd
>>>>>
>>>>>   London
>>>>>
>>>>>   W4 5XT
>>>>>
>>>>>   UK
>>>>>
>>>>>   
>>>>> <https://www.openbet.com/email_promo>
>>>>>
>>>>> This message is confidential and intended only for the addressee. If you
>>>>> have received this message in error, please immediately notify the
>>>>> postmas...@openbet.com <mailto:postmas...@openbet.com> and delete it
>>>>> from your system as well as any copies. The content of e-mails as well
>>>>> as traffic data may be monitored by OpenBet for employment and security
>>>>> purposes. To protect the environment please do not print this e-mail
>>>>> unless necessary. OpenBet Ltd. Registered Office: Chiswick Park Building
>>>>> 9, 566 Chiswick High Road, London, W4 5XT, United Kingdom. A company
>>>>> registered in England and Wales. Registered no. 3134634. VAT no.
>>>>> GB927523612
>>>>>
>>> -- 
>>> Signature
>>> <http://www.openbet.com/>   Michal Borowiecki
>>> Senior Software Engineer L4
>>>     T:      +44 208 742 1600
>>>
>>>     
>>>     +44 203 249 8448
>>>
>>>     
>>>      
>>>     E:      michal.borowie...@openbet.com
>>>     W:      www.openbet.com <http://www.openbet.com/>
>>>
>>>     
>>>     OpenBet Ltd
>>>
>>>     Chiswick Park Building 9
>>>
>>>     566 Chiswick High Rd
>>>
>>>     London
>>>
>>>     W4 5XT
>>>
>>>     UK
>>>
>>>     
>>> <https://www.openbet.com/email_promo>
>>>
>>> This message is confidential and intended only for the addressee. If you
>>> have received this message in error, please immediately notify the
>>> postmas...@openbet.com <mailto:postmas...@openbet.com> and delete it
>>> from your system as well as any copies. The content of e-mails as well
>>> as traffic data may be monitored by OpenBet for employment and security
>>> purposes. To protect the environment please do not print this e-mail
>>> unless necessary. OpenBet Ltd. Registered Office: Chiswick Park Building
>>> 9, 566 Chiswick High Road, London, W4 5XT, United Kingdom. A company
>>> registered in England and Wales. Registered no. 3134634. VAT no.
>>> GB927523612
>>>
> 
> -- 
> Signature
> <http://www.openbet.com/>     Michal Borowiecki
> Senior Software Engineer L4
>       T:      +44 208 742 1600
> 
>       
>       +44 203 249 8448
> 
>       
>        
>       E:      michal.borowie...@openbet.com
>       W:      www.openbet.com <http://www.openbet.com/>
> 
>       
>       OpenBet Ltd
> 
>       Chiswick Park Building 9
> 
>       566 Chiswick High Rd
> 
>       London
> 
>       W4 5XT
> 
>       UK
> 
>       
> <https://www.openbet.com/email_promo>
> 
> This message is confidential and intended only for the addressee. If you
> have received this message in error, please immediately notify the
> postmas...@openbet.com <mailto:postmas...@openbet.com> and delete it
> from your system as well as any copies. The content of e-mails as well
> as traffic data may be monitored by OpenBet for employment and security
> purposes. To protect the environment please do not print this e-mail
> unless necessary. OpenBet Ltd. Registered Office: Chiswick Park Building
> 9, 566 Chiswick High Road, London, W4 5XT, United Kingdom. A company
> registered in England and Wales. Registered no. 3134634. VAT no.
> GB927523612
>

signature.asc
Description: OpenPGP digital signature

Re: [DISCUSS] KIP-138: Change punctuate semantics

Reply via email to