Heads-up: I filed SPARK-39805 to perform deprecation of Trigger.Once in
Spark 3.4.0.
Regarding Adam's use case: although there is no good solution to this, it
is fairly easy to cover the specific use case of Trigger.Once via having a
flag in Trigger.AvailableNow to enforce processing all available
Final reminder. I'll leave this thread for a couple of days to see further
voices, and go forward if there is no outstanding comment.
On Sat, Jul 9, 2022 at 9:54 PM Jungtaek Lim
wrote:
> It sounds like none of the approaches perfectly solve the issue of
> backfill.
>
> 1. Trigger.Once: scale iss
It sounds like none of the approaches perfectly solve the issue of backfill.
1. Trigger.Once: scale issue
2. Trigger.AvailbleNow: watermark advancement issue (data getting dropped
due to watermark) depending on the order of data
3. Manual batch: state is not built from processing backfill
Handlin
Dang I was hoping it was the second one. In our case the data is too large
to run the whole backfill for the aggregation in a single batch (the
shuffle is too big). We currently resort to manually batching (i.e. not
streaming) the backlog (anything older than the watermark) when we need to
reproces
Thanks for the input, Adam! Replying inline.
On Fri, Jul 8, 2022 at 8:48 PM Adam Binford wrote:
> We use Trigger.Once a lot, usually for backfilling data for new streams. I
> feel like I could see a continuing use case for "ignore trigger limits for
> this batch" (ignoring the whole issue with r
We use Trigger.Once a lot, usually for backfilling data for new streams. I
feel like I could see a continuing use case for "ignore trigger limits for
this batch" (ignoring the whole issue with re-running the last failed batch
vs a new batch), but we haven't actually been able to upgrade yet and try
Bump to get a chance to expose the proposal to wider audiences.
Given that there are not many active contributors/maintainers in area
Structured Streaming, I'd consider the discussion as "lazy consensus" to
avoid being stuck. I'll give a final reminder early next week, and move
forward if there ar
Hi dev,
I would like to hear voices about deprecating Trigger.Once, and promoting
Trigger.AvailableNow as a replacement [1] in Structured Streaming.
(It doesn't mean we remove Trigger.Once now or near future. It probably
requires another discussion at some time.)
Rationalization:
The expected be