Thanks a lot for the KIP Michal, I was thinking about the four options you proposed in more details and this are my thoughts:
(A) You argue, that users can still "punctuate" on event-time via process(), but I am not sure if this is possible. Note, that users only get record timestamps via context.timestamp(). Thus, users would need to track the time progress per partition (based on the partitions they obverse via context.partition(). (This alone puts a huge burden on the user by itself.) However, users are not notified at startup what partitions are assigned, and user are not notified when partitions get revoked. Because this information is not available, it's not possible to "manually advance" stream-time, and thus event-time punctuation within process() seems not to be possible -- or do you see a way to get it done? And even if, it might still be too clumsy to use. (B) This does not allow to mix both approaches, thus limiting what users can do. (C) This should give all flexibility we need. However, just adding one more method seems to be a solution that is too simple (cf my comments below). (D) This might be hard to use. Also, I am not sure how a user could enable system-time and event-time punctuation in parallel. Overall options (C) seems to be the most promising approach to me. Because I also favor a clean API, we might keep current punctuate() as-is, but deprecate it -- so we can remove it at some later point when people use the "new punctuate API". Couple of follow up questions: - I am wondering, if we should have two callback methods or just one (ie, a unified for system and event time punctuation or one for each?). - If we have one, how can the user figure out, which condition did trigger? - How would the API look like, for registering different punctuate schedules? The "type" must be somehow defined? - We might want to add "complex" schedules later on (like, punctuate on every 10 seconds event-time or 60 seconds system-time whatever comes first). I don't say we should add this right away, but we might want to define the API in a way, that it allows extensions like this later on, without redesigning the API (ie, the API should be designed extensible) - Did you ever consider count-based punctuation? I understand, that you would like to solve a simple problem, but we learned from the past, that just "adding some API" quickly leads to a not very well defined API that needs time consuming clean up later on via other KIPs. Thus, I would prefer to get a holistic punctuation KIP with this from the beginning on to avoid later painful redesign. -Matthias On 4/3/17 11:58 AM, Michal Borowiecki wrote: > Thanks Thomas, > > I'm also wary of changing the existing semantics of punctuate, for > backward compatibility reasons, although I like the conceptual > simplicity of that option. > > Adding a new method to me feels safer but, in a way, uglier. I added > this to the KIP now as option (C). > > The TimestampExtractor mechanism is actually more flexible, as it allows > you to return any value, you're not limited to event time or system time > (although I don't see an actual use case where you might need anything > else then those two). Hence I also proposed the option to allow users > to, effectively, decide what "stream time" is for them given the > presence or absence of messages, much like they can decide what msg time > means for them using the TimestampExtractor. What do you think about > that? This is probably most flexible but also most complicated. > > All comments appreciated. > > Cheers, > > Michal > > > On 03/04/17 19:23, Thomas Becker wrote: >> Although I fully agree we need a way to trigger periodic processing >> that is independent from whether and when messages arrive, I'm not sure >> I like the idea of changing the existing semantics across the board. >> What if we added an additional callback to Processor that can be >> scheduled similarly to punctuate() but was always called at fixed, wall >> clock based intervals? This way you wouldn't have to give up the notion >> of stream time to be able to do periodic processing. >> >> On Mon, 2017-04-03 at 10:34 +0100, Michal Borowiecki wrote: >>> Hi all, >>> >>> I have created a draft for KIP-138: Change punctuate semantics >>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-138%3A+Change+ >>> punctuate+semantics> >>> . >>> >>> Appreciating there can be different views on system-time vs event- >>> time >>> semantics for punctuation depending on use-case and the importance of >>> backwards compatibility of any such change, I've left it quite open >>> and >>> hope to fill in more info as the discussion progresses. >>> >>> Thanks, >>> Michal >> -- >> >> >> Tommy Becker >> >> Senior Software Engineer >> >> O +1 919.460.4747 >> >> tivo.com >> >> >> ________________________________ >> >> This email and any attachments may contain confidential and privileged >> material for the sole use of the intended recipient. Any review, >> copying, or distribution of this email (or any attachments) by others >> is prohibited. If you are not the intended recipient, please contact >> the sender immediately and permanently delete this email and any >> attachments. No employee or agent of TiVo Inc. is authorized to >> conclude any binding agreement on behalf of TiVo Inc. by email. >> Binding agreements with TiVo Inc. may only be made by a signed written >> agreement. > >
signature.asc
Description: OpenPGP digital signature