Re: [DISCUSS] Deprecate Trigger.Once and promote Trigger.AvailableNow

2022-07-18 Thread Jungtaek Lim
Heads-up: I filed SPARK-39805 to perform deprecation of Trigger.Once in Spark 3.4.0. Regarding Adam's use case: although there is no good solution to this, it is fairly easy to cover the specific use case of Trigger.Once via having a flag in Trigger.AvailableNow to enforce processing all available

Re: [DISCUSS] Deprecate Trigger.Once and promote Trigger.AvailableNow

2022-07-11 Thread Jungtaek Lim
Final reminder. I'll leave this thread for a couple of days to see further voices, and go forward if there is no outstanding comment. On Sat, Jul 9, 2022 at 9:54 PM Jungtaek Lim wrote: > It sounds like none of the approaches perfectly solve the issue of > backfill. > > 1. Trigger.Once: scale iss

Re: [DISCUSS] Deprecate Trigger.Once and promote Trigger.AvailableNow

2022-07-09 Thread Jungtaek Lim
It sounds like none of the approaches perfectly solve the issue of backfill. 1. Trigger.Once: scale issue 2. Trigger.AvailbleNow: watermark advancement issue (data getting dropped due to watermark) depending on the order of data 3. Manual batch: state is not built from processing backfill Handlin

Re: [DISCUSS] Deprecate Trigger.Once and promote Trigger.AvailableNow

2022-07-08 Thread Adam Binford
Dang I was hoping it was the second one. In our case the data is too large to run the whole backfill for the aggregation in a single batch (the shuffle is too big). We currently resort to manually batching (i.e. not streaming) the backlog (anything older than the watermark) when we need to reproces

Re: [DISCUSS] Deprecate Trigger.Once and promote Trigger.AvailableNow

2022-07-08 Thread Jungtaek Lim
Thanks for the input, Adam! Replying inline. On Fri, Jul 8, 2022 at 8:48 PM Adam Binford wrote: > We use Trigger.Once a lot, usually for backfilling data for new streams. I > feel like I could see a continuing use case for "ignore trigger limits for > this batch" (ignoring the whole issue with r

Re: [DISCUSS] Deprecate Trigger.Once and promote Trigger.AvailableNow

2022-07-08 Thread Adam Binford
We use Trigger.Once a lot, usually for backfilling data for new streams. I feel like I could see a continuing use case for "ignore trigger limits for this batch" (ignoring the whole issue with re-running the last failed batch vs a new batch), but we haven't actually been able to upgrade yet and try

Re: [DISCUSS] Deprecate Trigger.Once and promote Trigger.AvailableNow

2022-07-08 Thread Jungtaek Lim
Bump to get a chance to expose the proposal to wider audiences. Given that there are not many active contributors/maintainers in area Structured Streaming, I'd consider the discussion as "lazy consensus" to avoid being stuck. I'll give a final reminder early next week, and move forward if there ar

[DISCUSS] Deprecate Trigger.Once and promote Trigger.AvailableNow

2022-07-06 Thread Jungtaek Lim
Hi dev, I would like to hear voices about deprecating Trigger.Once, and promoting Trigger.AvailableNow as a replacement [1] in Structured Streaming. (It doesn't mean we remove Trigger.Once now or near future. It probably requires another discussion at some time.) Rationalization: The expected be