Re: [DISCUSS] KIP-1124: Flexible Windows for Late Arriving Data

2025-02-06 Thread Matthias J. Sax
Sounds good. Thanks for clarifying. -Matthias On 2/6/25 3:19 PM, Almog Gavra wrote: Good call on the backwards compatibility - updated the KIP. Re: the grace period for BatchWindows, I think zero makes sense (and also makes implementing things a lot easier). In my mental model, we still drop

Re: [DISCUSS] KIP-1124: Flexible Windows for Late Arriving Data

2025-02-06 Thread Almog Gavra
Good call on the backwards compatibility - updated the KIP. Re: the grace period for BatchWindows, I think zero makes sense (and also makes implementing things a lot easier). In my mental model, we still drop late records that come in after the window closes, they just never happen because we use

Re: [DISCUSS] KIP-1124: Flexible Windows for Late Arriving Data

2025-02-06 Thread Matthias J. Sax
Hit "reply" too early. Just re-read the KIP. For `Windows#windowsFor(...)`, even if not intended to be implement by users, it's strictly public API. Thus, we cannot just change the method, but would need to keep the existing method and deprecate it, and add a new overload with a default impl t

Re: [DISCUSS] KIP-1124: Flexible Windows for Late Arriving Data

2025-02-06 Thread Matthias J. Sax
BatchWindows works for me. On 2/6/25 7:34 AM, Almog Gavra wrote: Happy to name it BatchWindows. Will give some people time to chime in and then change the name. - Almog On Tue, Feb 4, 2025 at 11:10 PM Sophie Blee-Goldman wrote: One minor suggestion: use BatchWindows instead of BatchedWindo

Re: [DISCUSS] KIP-1124: Flexible Windows for Late Arriving Data

2025-02-06 Thread Almog Gavra
Happy to name it BatchWindows. Will give some people time to chime in and then change the name. - Almog On Tue, Feb 4, 2025 at 11:10 PM Sophie Blee-Goldman wrote: > One minor suggestion: use BatchWindows instead of BatchedWindows. The > version without the "ed" matches up with the established n

Re: [DISCUSS] KIP-1124: Flexible Windows for Late Arriving Data

2025-02-04 Thread Sophie Blee-Goldman
One minor suggestion: use BatchWindows instead of BatchedWindows. The version without the "ed" matches up with the established naming pattern and grammar used by other Windows classes: eg TimeWindows, SessionWindows, SlidingWindows Not a big deal though, won't redact my +1 on the voting thread if

Re: [DISCUSS] KIP-1124: Flexible Windows for Late Arriving Data

2025-02-04 Thread Almog Gavra
Thanks for the discussion everyone! I've updated the Wiki with the following changes: - Renamed to BatchedWindows - Add a note in rejected alternatives about more general purpose (micro-)batching functionality since the scope of that is much wider. Since it looks like we've stabilized the discuss

Re: [DISCUSS] KIP-1124: Flexible Windows for Late Arriving Data

2025-01-30 Thread Matthias J. Sax
batch window with max N records, and then also specifying a BufferConfig.maxRecords() That's actually two different and independent dimensions. "N records" would be the number of records in the window, but `maxRecords` is the number of unique keys/row in the buffer before it's flushed. t

Re: [DISCUSS] KIP-1124: Flexible Windows for Late Arriving Data

2025-01-30 Thread Almog Gavra
I'm not opposed to "BatchedWindows" - I think I like that the most so far. I'll let that sit on the discussion thread for a while, and change the KIP to match if no concerns. > What I don't understand is, why the relationship to suppress()/emitStrategy() is relevant? Can you elaborate a little bit

Re: [DISCUSS] KIP-1124: Flexible Windows for Late Arriving Data

2025-01-28 Thread Matthias J. Sax
Interesting thoughts. So maybe we could go with `BatchWindows` as a name? Again, only spit-balling... If we really put "(micro-)batching" in the center of this idea, I think both count-based and time-based (and time could actually be either stream-time or wall-clock-time), or any combination o

Re: [DISCUSS] KIP-1124: Flexible Windows for Late Arriving Data

2025-01-28 Thread Almog Gavra
Thanks for the feedback Lucas and Bruno! L0. "Given the motivation section, it sounds we actually want something that I'd call "batching" rather than "windowing"." You are right here, and I think ultimately introducing more flexible and controlled micro-batching will be useful for Kafka Streams,

Re: [DISCUSS] KIP-1124: Flexible Windows for Late Arriving Data

2025-01-28 Thread Bruno Cadonna
Hi Almog, I had similar thoughts as Lucas. When I read the KIP, I asked myself why are the windows not specified on number of records instead of time if we do not care about whether the event time of the records is in the time range of the window? In your motivation, you write that users mig

Re: [DISCUSS] KIP-1124: Flexible Windows for Late Arriving Data

2025-01-28 Thread Lucas Brutschy
Hi Almog, this seems useful to me. I don't see anything wrong with the details of the proposal. More generally, I'd like to hear your thoughts on this vs. batching. Given the motivation section, it sounds we actually want something that I'd call "batching" rather than "windowing". If you do not r

Re: [DISCUSS] KIP-1124: Flexible Windows for Late Arriving Data

2025-01-23 Thread Matthias J. Sax
Thanks, Almog. Good call out about `TimeWindows` vs `TimeWindow` (yes, I am aware and was actually re-reading my previous email before sending it a few times to make sure I use the right one; it's very subtle.) For `TimeWindows` semantics are certainly well defined, and there is nothing to b

Re: [DISCUSS] KIP-1124: Flexible Windows for Late Arriving Data

2025-01-23 Thread Almog Gavra
Thanks Matthias for the quick and detailed feedback! > Nit: it seems you are mixing the terms "out-of-order" and "late" and using them as synonymous, what we usually not do. M1. Ah, in my mind "late arriving" was after the window closed but potentially before grace (and "out of order" was just an

Re: [DISCUSS] KIP-1124: Flexible Windows for Late Arriving Data

2025-01-23 Thread Matthias J. Sax
Interesting KIP. It's a known problem, and the proposed solution make sense to me. Nit: it seems you are mixing the terms "out-of-order" and "late" and using them as synonymous, what we usually not do. "Out-of-order" is the more generic term, while "late" means after the grace period (hence,

[DISCUSS] KIP-1124: Flexible Windows for Late Arriving Data

2025-01-22 Thread Almog Gavra
Hello! I'd like to initiate a discussion thread on KIP-1127: https://cwiki.apache.org/confluence/display/KAFKA/KIP-1127+Flexible+Windows+for+Late+Arriving+Data This KIP aims to make it easier to specify windowing semantics that are more tolerable to late arriving data, particularly with suppressi