That sounds promising. I am just wondering if `Time` is the best name. Maybe we want to add other non-time based punctuations at some point later. I would suggest
enum PunctuationType { EVENT_TIME, SYSTEM_TIME, } or similar. Just to keep the door open -- it's easier to add new stuff if the name is more generic. -Matthias On 4/4/17 5:30 AM, Thomas Becker wrote: > I agree that the framework providing and managing the notion of stream > time is valuable and not something we would want to delegate to the > tasks. I'm not entirely convinced that a separate callback (option C) > is that messy (it could just be a default method with an empty > implementation), but if we wanted a single API to handle both cases, > how about something like the following? > > enum Time { > STREAM, > CLOCK > } > > Then on ProcessorContext: > context.schedule(Time time, long interval) // We could allow this to > be called once for each value of time to mix approaches. > > Then the Processor API becomes: > punctuate(Time time) // time here denotes which schedule resulted in > this call. > > Thoughts? > > > On Mon, 2017-04-03 at 22:44 -0700, Matthias J. Sax wrote: >> Thanks a lot for the KIP Michal, >> >> I was thinking about the four options you proposed in more details >> and >> this are my thoughts: >> >> (A) You argue, that users can still "punctuate" on event-time via >> process(), but I am not sure if this is possible. Note, that users >> only >> get record timestamps via context.timestamp(). Thus, users would need >> to >> track the time progress per partition (based on the partitions they >> obverse via context.partition(). (This alone puts a huge burden on >> the >> user by itself.) However, users are not notified at startup what >> partitions are assigned, and user are not notified when partitions >> get >> revoked. Because this information is not available, it's not possible >> to >> "manually advance" stream-time, and thus event-time punctuation >> within >> process() seems not to be possible -- or do you see a way to get it >> done? And even if, it might still be too clumsy to use. >> >> (B) This does not allow to mix both approaches, thus limiting what >> users >> can do. >> >> (C) This should give all flexibility we need. However, just adding >> one >> more method seems to be a solution that is too simple (cf my comments >> below). >> >> (D) This might be hard to use. Also, I am not sure how a user could >> enable system-time and event-time punctuation in parallel. >> >> >> >> Overall options (C) seems to be the most promising approach to me. >> Because I also favor a clean API, we might keep current punctuate() >> as-is, but deprecate it -- so we can remove it at some later point >> when >> people use the "new punctuate API". >> >> >> Couple of follow up questions: >> >> - I am wondering, if we should have two callback methods or just one >> (ie, a unified for system and event time punctuation or one for >> each?). >> >> - If we have one, how can the user figure out, which condition did >> trigger? >> >> - How would the API look like, for registering different punctuate >> schedules? The "type" must be somehow defined? >> >> - We might want to add "complex" schedules later on (like, punctuate >> on >> every 10 seconds event-time or 60 seconds system-time whatever comes >> first). I don't say we should add this right away, but we might want >> to >> define the API in a way, that it allows extensions like this later >> on, >> without redesigning the API (ie, the API should be designed >> extensible) >> >> - Did you ever consider count-based punctuation? >> >> >> I understand, that you would like to solve a simple problem, but we >> learned from the past, that just "adding some API" quickly leads to a >> not very well defined API that needs time consuming clean up later on >> via other KIPs. Thus, I would prefer to get a holistic punctuation >> KIP >> with this from the beginning on to avoid later painful redesign. >> >> >> >> -Matthias >> >> >> >> On 4/3/17 11:58 AM, Michal Borowiecki wrote: >>> >>> Thanks Thomas, >>> >>> I'm also wary of changing the existing semantics of punctuate, for >>> backward compatibility reasons, although I like the conceptual >>> simplicity of that option. >>> >>> Adding a new method to me feels safer but, in a way, uglier. I >>> added >>> this to the KIP now as option (C). >>> >>> The TimestampExtractor mechanism is actually more flexible, as it >>> allows >>> you to return any value, you're not limited to event time or system >>> time >>> (although I don't see an actual use case where you might need >>> anything >>> else then those two). Hence I also proposed the option to allow >>> users >>> to, effectively, decide what "stream time" is for them given the >>> presence or absence of messages, much like they can decide what msg >>> time >>> means for them using the TimestampExtractor. What do you think >>> about >>> that? This is probably most flexible but also most complicated. >>> >>> All comments appreciated. >>> >>> Cheers, >>> >>> Michal >>> >>> >>> On 03/04/17 19:23, Thomas Becker wrote: >>>> >>>> Although I fully agree we need a way to trigger periodic >>>> processing >>>> that is independent from whether and when messages arrive, I'm >>>> not sure >>>> I like the idea of changing the existing semantics across the >>>> board. >>>> What if we added an additional callback to Processor that can be >>>> scheduled similarly to punctuate() but was always called at >>>> fixed, wall >>>> clock based intervals? This way you wouldn't have to give up the >>>> notion >>>> of stream time to be able to do periodic processing. >>>> >>>> On Mon, 2017-04-03 at 10:34 +0100, Michal Borowiecki wrote: >>>>> >>>>> Hi all, >>>>> >>>>> I have created a draft for KIP-138: Change punctuate semantics >>>>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-138%3A+C >>>>> hange+ >>>>> punctuate+semantics> >>>>> . >>>>> >>>>> Appreciating there can be different views on system-time vs >>>>> event- >>>>> time >>>>> semantics for punctuation depending on use-case and the >>>>> importance of >>>>> backwards compatibility of any such change, I've left it quite >>>>> open >>>>> and >>>>> hope to fill in more info as the discussion progresses. >>>>> >>>>> Thanks, >>>>> Michal > -- > > > Tommy Becker > > Senior Software Engineer > > O +1 919.460.4747 > > tivo.com > > > ________________________________ > > This email and any attachments may contain confidential and privileged > material for the sole use of the intended recipient. Any review, copying, or > distribution of this email (or any attachments) by others is prohibited. If > you are not the intended recipient, please contact the sender immediately and > permanently delete this email and any attachments. No employee or agent of > TiVo Inc. is authorized to conclude any binding agreement on behalf of TiVo > Inc. by email. Binding agreements with TiVo Inc. may only be made by a signed > written agreement. >
signature.asc
Description: OpenPGP digital signature