Sounds promising to me too. I'll update the KIP with this as the primary proposal but leave the other alternatives there as under consideration for now.
Cheers, Michal On 4 April 2017 at 19:10, Matthias J. Sax <matth...@confluent.io> wrote: > That sounds promising. > > I am just wondering if `Time` is the best name. Maybe we want to add > other non-time based punctuations at some point later. I would suggest > > enum PunctuationType { > EVENT_TIME, > SYSTEM_TIME, > } > > or similar. Just to keep the door open -- it's easier to add new stuff > if the name is more generic. > > > -Matthias > > > On 4/4/17 5:30 AM, Thomas Becker wrote: > > I agree that the framework providing and managing the notion of stream > > time is valuable and not something we would want to delegate to the > > tasks. I'm not entirely convinced that a separate callback (option C) > > is that messy (it could just be a default method with an empty > > implementation), but if we wanted a single API to handle both cases, > > how about something like the following? > > > > enum Time { > > STREAM, > > CLOCK > > } > > > > Then on ProcessorContext: > > context.schedule(Time time, long interval) // We could allow this to > > be called once for each value of time to mix approaches. > > > > Then the Processor API becomes: > > punctuate(Time time) // time here denotes which schedule resulted in > > this call. > > > > Thoughts? > > > > > > On Mon, 2017-04-03 at 22:44 -0700, Matthias J. Sax wrote: > >> Thanks a lot for the KIP Michal, > >> > >> I was thinking about the four options you proposed in more details > >> and > >> this are my thoughts: > >> > >> (A) You argue, that users can still "punctuate" on event-time via > >> process(), but I am not sure if this is possible. Note, that users > >> only > >> get record timestamps via context.timestamp(). Thus, users would need > >> to > >> track the time progress per partition (based on the partitions they > >> obverse via context.partition(). (This alone puts a huge burden on > >> the > >> user by itself.) However, users are not notified at startup what > >> partitions are assigned, and user are not notified when partitions > >> get > >> revoked. Because this information is not available, it's not possible > >> to > >> "manually advance" stream-time, and thus event-time punctuation > >> within > >> process() seems not to be possible -- or do you see a way to get it > >> done? And even if, it might still be too clumsy to use. > >> > >> (B) This does not allow to mix both approaches, thus limiting what > >> users > >> can do. > >> > >> (C) This should give all flexibility we need. However, just adding > >> one > >> more method seems to be a solution that is too simple (cf my comments > >> below). > >> > >> (D) This might be hard to use. Also, I am not sure how a user could > >> enable system-time and event-time punctuation in parallel. > >> > >> > >> > >> Overall options (C) seems to be the most promising approach to me. > >> Because I also favor a clean API, we might keep current punctuate() > >> as-is, but deprecate it -- so we can remove it at some later point > >> when > >> people use the "new punctuate API". > >> > >> > >> Couple of follow up questions: > >> > >> - I am wondering, if we should have two callback methods or just one > >> (ie, a unified for system and event time punctuation or one for > >> each?). > >> > >> - If we have one, how can the user figure out, which condition did > >> trigger? > >> > >> - How would the API look like, for registering different punctuate > >> schedules? The "type" must be somehow defined? > >> > >> - We might want to add "complex" schedules later on (like, punctuate > >> on > >> every 10 seconds event-time or 60 seconds system-time whatever comes > >> first). I don't say we should add this right away, but we might want > >> to > >> define the API in a way, that it allows extensions like this later > >> on, > >> without redesigning the API (ie, the API should be designed > >> extensible) > >> > >> - Did you ever consider count-based punctuation? > >> > >> > >> I understand, that you would like to solve a simple problem, but we > >> learned from the past, that just "adding some API" quickly leads to a > >> not very well defined API that needs time consuming clean up later on > >> via other KIPs. Thus, I would prefer to get a holistic punctuation > >> KIP > >> with this from the beginning on to avoid later painful redesign. > >> > >> > >> > >> -Matthias > >> > >> > >> > >> On 4/3/17 11:58 AM, Michal Borowiecki wrote: > >>> > >>> Thanks Thomas, > >>> > >>> I'm also wary of changing the existing semantics of punctuate, for > >>> backward compatibility reasons, although I like the conceptual > >>> simplicity of that option. > >>> > >>> Adding a new method to me feels safer but, in a way, uglier. I > >>> added > >>> this to the KIP now as option (C). > >>> > >>> The TimestampExtractor mechanism is actually more flexible, as it > >>> allows > >>> you to return any value, you're not limited to event time or system > >>> time > >>> (although I don't see an actual use case where you might need > >>> anything > >>> else then those two). Hence I also proposed the option to allow > >>> users > >>> to, effectively, decide what "stream time" is for them given the > >>> presence or absence of messages, much like they can decide what msg > >>> time > >>> means for them using the TimestampExtractor. What do you think > >>> about > >>> that? This is probably most flexible but also most complicated. > >>> > >>> All comments appreciated. > >>> > >>> Cheers, > >>> > >>> Michal > >>> > >>> > >>> On 03/04/17 19:23, Thomas Becker wrote: > >>>> > >>>> Although I fully agree we need a way to trigger periodic > >>>> processing > >>>> that is independent from whether and when messages arrive, I'm > >>>> not sure > >>>> I like the idea of changing the existing semantics across the > >>>> board. > >>>> What if we added an additional callback to Processor that can be > >>>> scheduled similarly to punctuate() but was always called at > >>>> fixed, wall > >>>> clock based intervals? This way you wouldn't have to give up the > >>>> notion > >>>> of stream time to be able to do periodic processing. > >>>> > >>>> On Mon, 2017-04-03 at 10:34 +0100, Michal Borowiecki wrote: > >>>>> > >>>>> Hi all, > >>>>> > >>>>> I have created a draft for KIP-138: Change punctuate semantics > >>>>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-138%3A+C > >>>>> hange+ > >>>>> punctuate+semantics> > >>>>> . > >>>>> > >>>>> Appreciating there can be different views on system-time vs > >>>>> event- > >>>>> time > >>>>> semantics for punctuation depending on use-case and the > >>>>> importance of > >>>>> backwards compatibility of any such change, I've left it quite > >>>>> open > >>>>> and > >>>>> hope to fill in more info as the discussion progresses. > >>>>> > >>>>> Thanks, > >>>>> Michal > > -- > > > > > > Tommy Becker > > > > Senior Software Engineer > > > > O +1 919.460.4747 > > > > tivo.com > > > > > > ________________________________ > > > > This email and any attachments may contain confidential and privileged > material for the sole use of the intended recipient. Any review, copying, > or distribution of this email (or any attachments) by others is prohibited. > If you are not the intended recipient, please contact the sender > immediately and permanently delete this email and any attachments. No > employee or agent of TiVo Inc. is authorized to conclude any binding > agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo > Inc. may only be made by a signed written agreement. > > > > -- <http://www.openbet.com> Michal Borowiecki Technical Lead T: +44 208 742 1600 +44 203 249 8448 E: michal.borowie...@openbet.com W: www.openbet.com OpenBet Ltd Chiswick Park Building 9 566 Chiswick High Rd London W4 5XT <http://twitter.com/OpenBet_Ltd> <http://www.linkedin.com/company/165331/> <http://www.facebook.com/OpenBet> Winner of Sports Betting Supplier of the Year at the EGR B2B Awards 2010, 2011 & 2012 This message is confidential and intended only for the addressee. If you have received this message in error, please immediately notify the postmas...@openbet.com and delete it from your system as well as any copies. The content of e-mails as well as traffic data may be monitored by OpenBet for employment and security purposes. To protect the environment please do not print this e-mail unless necessary. OpenBet Ltd. Registered Office: Chiswick Park Building 9, 566 Chiswick High Road, London, W4 5XT, United Kingdom. A company registered in England and Wales. Registered no. 3134634. VAT no. GB927523612