Re: Question about using batch to bootstrap state

Michael LeGore via user Mon, 09 Jun 2025 13:02:50 -0700

Hi,

I was under the impression from that documentation that it meant that the
batch job would output a savepoint after the job ends, using the same code
as the BATCH job itself, and also running outputting the results of the
BATCH job.

As it stands, you can write a normal job using the Datastream API to do
your actual production work, then also you have to write an equivalent job
using the state processor API (which has a slightly different api). Given
the goal of unified batch and streaming, I was hoping that there was a way
to run the job in batch mode to produce results, also producing a savepoint
to use in streaming mode, all with the *same* code. Writing this out, I am
realizing that the main performance benefit of batch mode is probably that
it does not need to write state to rocksdb (or another state store), so
running in BATCH mode while still writing a savepoint would negate that
benefit, but for our use case it would still be useful to get the input
sorting that batch mode.

We need the input sorting because we would like to use batch mode to
"replay" old events into flink in the correct order, for our feature
extraction system to produce correct point-in-time features that simulate
what would have happened in streaming mode. If there were a mode that acted
like streaming mode, except that it sorted per-key like batch mode does,
but still output a checkpoint or savepoint and still fired timers at the
correct times (not that end of each key), that would be ideal.

I am also interested in the answer to my second question about the status
of batch -> streaming mode switchover described in
https://issues.apache.org/jira/browse/FLINK-33202

On Fri, May 30, 2025 at 11:15 PM Gabor Somogyi <gabor.g.somo...@gmail.com>
wrote:

> Hi Michael,
>
> > batch job emit a savepoint at the end of the job
>
> Not sure what do you mean by it's on the roadmap. There is the state
> processor API [1] and based on that one can write such job already.
>
> [1]
> https://nightlies.apache.org/flink/flink-docs-master/docs/libs/state_processor_api/
>
> BR,
> G
>
>
> On Sat, May 31, 2025 at 1:02 AM Michael LeGore via user <
> user@flink.apache.org> wrote:
>
>> Hi all,
>>
>> I was wondering if there has been work done towards having a batch job
>> emit a savepoint at the end of the job? I see that it is mentioned in
>> documentation and some of the roadmap that this is planned, but I haven't
>> seen a JIRA ticket for it and I was wondering if that is still planned?
>>
>> I also am interested in where the batch to streaming switchover for
>> https://issues.apache.org/jira/browse/FLINK-33202 is, this would be a
>> really useful feature for our team and I am very interested in that work!
>>
>> Cheers,
>> Michael LeGore
>>
>

Re: Question about using batch to bootstrap state

Reply via email to