Non-time based windowing

Joey Tran Fri, 31 Jan 2025 09:40:15 -0800

I have some use cases where I have some global-ish context I'd like to
partition my pipeline by but that aren't based on time. Does it seem
reasonable to use windowing to encapsulate this kind of global context
anyways?


Contrived example, imagine I have a workflow for figuring out the
highest scoring word in scrabble based on an input set of letters.

--(set[str])-->[EnumerateAllPossibleWords]-->(str)-->[KeepTopNWords]-->(str)

Now If I want to use this pipeline for multiple input letter sets, I'll end
up mixing together candidate words that come from different letter sets. I
could incorporate some kind of ID for these letter sets (e.g. a
ScrabbleGameID) to partition with later, but then I'll need to propagate
that key everywhere. For example, `EnumerateAllPossibleWords` may do its
own keyed operations internally which then will all need to be able to
accommodate bookkeeping for ScrabbleGameID.

Generating windows that are actually based on ScrabbleGameID (e.g. one
window per letter set) feels like a nice way to implicitly partition my
pipeline so I don't have to include ScrabbleGameID into transforms that
really don't care about it.

When looking at windowing functions though, they're all very timestamp
based which made me pause and wonder if I'm abusing the window abstraction
or if timetamp-based windows are just a subset of windows that are just
more highlighted b/c of streaming.

(sorry hope this makes sense and is not just a ramble)

Non-time based windowing

Reply via email to