That sounds promising.

I am just wondering if `Time` is the best name. Maybe we want to add
other non-time based punctuations at some point later. I would suggest

enum PunctuationType {
  EVENT_TIME,
  SYSTEM_TIME,
}

or similar. Just to keep the door open -- it's easier to add new stuff
if the name is more generic.


-Matthias


On 4/4/17 5:30 AM, Thomas Becker wrote:
> I agree that the framework providing and managing the notion of stream
> time is valuable and not something we would want to delegate to the
> tasks. I'm not entirely convinced that a separate callback (option C)
> is that messy (it could just be a default method with an empty
> implementation), but if we wanted a single API to handle both cases,
> how about something like the following?
> 
> enum Time {
>    STREAM,
>    CLOCK
> }
> 
> Then on ProcessorContext:
> context.schedule(Time time, long interval)  // We could allow this to
> be called once for each value of time to mix approaches.
> 
> Then the Processor API becomes:
> punctuate(Time time) // time here denotes which schedule resulted in
> this call.
> 
> Thoughts?
> 
> 
> On Mon, 2017-04-03 at 22:44 -0700, Matthias J. Sax wrote:
>> Thanks a lot for the KIP Michal,
>>
>> I was thinking about the four options you proposed in more details
>> and
>> this are my thoughts:
>>
>> (A) You argue, that users can still "punctuate" on event-time via
>> process(), but I am not sure if this is possible. Note, that users
>> only
>> get record timestamps via context.timestamp(). Thus, users would need
>> to
>> track the time progress per partition (based on the partitions they
>> obverse via context.partition(). (This alone puts a huge burden on
>> the
>> user by itself.) However, users are not notified at startup what
>> partitions are assigned, and user are not notified when partitions
>> get
>> revoked. Because this information is not available, it's not possible
>> to
>> "manually advance" stream-time, and thus event-time punctuation
>> within
>> process() seems not to be possible -- or do you see a way to get it
>> done? And even if, it might still be too clumsy to use.
>>
>> (B) This does not allow to mix both approaches, thus limiting what
>> users
>> can do.
>>
>> (C) This should give all flexibility we need. However, just adding
>> one
>> more method seems to be a solution that is too simple (cf my comments
>> below).
>>
>> (D) This might be hard to use. Also, I am not sure how a user could
>> enable system-time and event-time punctuation in parallel.
>>
>>
>>
>> Overall options (C) seems to be the most promising approach to me.
>> Because I also favor a clean API, we might keep current punctuate()
>> as-is, but deprecate it -- so we can remove it at some later point
>> when
>> people use the "new punctuate API".
>>
>>
>> Couple of follow up questions:
>>
>> - I am wondering, if we should have two callback methods or just one
>> (ie, a unified for system and event time punctuation or one for
>> each?).
>>
>> - If we have one, how can the user figure out, which condition did
>> trigger?
>>
>> - How would the API look like, for registering different punctuate
>> schedules? The "type" must be somehow defined?
>>
>> - We might want to add "complex" schedules later on (like, punctuate
>> on
>> every 10 seconds event-time or 60 seconds system-time whatever comes
>> first). I don't say we should add this right away, but we might want
>> to
>> define the API in a way, that it allows extensions like this later
>> on,
>> without redesigning the API (ie, the API should be designed
>> extensible)
>>
>> - Did you ever consider count-based punctuation?
>>
>>
>> I understand, that you would like to solve a simple problem, but we
>> learned from the past, that just "adding some API" quickly leads to a
>> not very well defined API that needs time consuming clean up later on
>> via other KIPs. Thus, I would prefer to get a holistic punctuation
>> KIP
>> with this from the beginning on to avoid later painful redesign.
>>
>>
>>
>> -Matthias
>>
>>
>>
>> On 4/3/17 11:58 AM, Michal Borowiecki wrote:
>>>
>>> Thanks Thomas,
>>>
>>> I'm also wary of changing the existing semantics of punctuate, for
>>> backward compatibility reasons, although I like the conceptual
>>> simplicity of that option.
>>>
>>> Adding a new method to me feels safer but, in a way, uglier. I
>>> added
>>> this to the KIP now as option (C).
>>>
>>> The TimestampExtractor mechanism is actually more flexible, as it
>>> allows
>>> you to return any value, you're not limited to event time or system
>>> time
>>> (although I don't see an actual use case where you might need
>>> anything
>>> else then those two). Hence I also proposed the option to allow
>>> users
>>> to, effectively, decide what "stream time" is for them given the
>>> presence or absence of messages, much like they can decide what msg
>>> time
>>> means for them using the TimestampExtractor. What do you think
>>> about
>>> that? This is probably most flexible but also most complicated.
>>>
>>> All comments appreciated.
>>>
>>> Cheers,
>>>
>>> Michal
>>>
>>>
>>> On 03/04/17 19:23, Thomas Becker wrote:
>>>>
>>>> Although I fully agree we need a way to trigger periodic
>>>> processing
>>>> that is independent from whether and when messages arrive, I'm
>>>> not sure
>>>> I like the idea of changing the existing semantics across the
>>>> board.
>>>> What if we added an additional callback to Processor that can be
>>>> scheduled similarly to punctuate() but was always called at
>>>> fixed, wall
>>>> clock based intervals? This way you wouldn't have to give up the
>>>> notion
>>>> of stream time to be able to do periodic processing.
>>>>
>>>> On Mon, 2017-04-03 at 10:34 +0100, Michal Borowiecki wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> I have created a draft for KIP-138: Change punctuate semantics
>>>>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-138%3A+C
>>>>> hange+
>>>>> punctuate+semantics>
>>>>> .
>>>>>
>>>>> Appreciating there can be different views on system-time vs
>>>>> event-
>>>>> time
>>>>> semantics for punctuation depending on use-case and the
>>>>> importance of
>>>>> backwards compatibility of any such change, I've left it quite
>>>>> open
>>>>> and
>>>>> hope to fill in more info as the discussion progresses.
>>>>>
>>>>> Thanks,
>>>>> Michal
> --
> 
> 
>     Tommy Becker
> 
>     Senior Software Engineer
> 
>     O +1 919.460.4747
> 
>     tivo.com
> 
> 
> ________________________________
> 
> This email and any attachments may contain confidential and privileged 
> material for the sole use of the intended recipient. Any review, copying, or 
> distribution of this email (or any attachments) by others is prohibited. If 
> you are not the intended recipient, please contact the sender immediately and 
> permanently delete this email and any attachments. No employee or agent of 
> TiVo Inc. is authorized to conclude any binding agreement on behalf of TiVo 
> Inc. by email. Binding agreements with TiVo Inc. may only be made by a signed 
> written agreement.
> 

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to