Hi Vtygoss, Very thanks for sharing the scenarios!
Currently for batch mode checkpoint is not support, thus it could not create a snapshot after the job is finished. However, there might be some alternative solutions: 1. Hybrid source [1] targets at allowing first read from a bounded source, then switch to an unbounded source, which seems to work in this case. however, currently it might not support the table / sql yet, which might be done in 1.15. 2. The batch job might first write the result to an intermediate table, then for the unbounded stream job, it might first load the table into state with DataStream API on startup or use dimension join to continue processing new records. Best, Yun [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-150%3A+Introduce+Hybrid+Source ------------------Original Mail ------------------ Sender:vtygoss <vtyg...@126.com> Send Date:Wed Dec 1 17:52:17 2021 Recipients:Alexander Preuß <alexanderpre...@ververica.com> CC:user@flink.apache.org <user@flink.apache.org> Subject:Re: how to run streaming process after batch process is completed? Hi Alexander, This is my ideal data pipeline. - 1. Sqoop transfer bounded data from database to hive. And I think flink batch process is more efficient than streaming process, so i want to process this bounded data in batch mode and write result in HiveTable2. - 2. There ares some tools to transfer CDC / BINLOG to kafka, and to write incremental unbounded data in HiveTable1. I want to process this unbounded data in streaming mode and update incremental result in HiveTable2. So this is the problem. The flink streaming sql application cannot be restored from batch process application. e.g. SQL: insert into table_2 select count(1) from table_1. In batch mode, the result stored in table_2 is N. And i expect that the accumulator number starts from N, not 0 when streaming process started. Thanks for your reply. Best Regard! (sending again because I accidentally left out the user ml in the reply on the first try)... 在 2021年11月30日 21:42,Alexander Preuß<alexanderpre...@ververica.com> 写道: Hi Vtygoss, Can you explain a bit more about your ideal pipeline? Is the batch data bounded data or could you also process it in streaming execution mode? And is the streaming data derived from the batch data or do you just want to ensure that the batch has been finished before running the processing of the streaming data? Best Regards, Alexander (sending again because I accidentally left out the user ml in the reply on the first try) On Tue, Nov 30, 2021 at 12:38 PM vtygoss <vtyg...@126.com> wrote: Hi, community! By Flink, I want to unify batch process and streaming process in data production pipeline. Batch process is used to process inventory data, then streaming process is used to process incremental data. But I meet a problem, there is no state in batch and the result is error if i run stream process directly. So how to run streaming process accurately after batch process is completed? Is there any doc or demo to handle this scenario? Thanks for your any reply or suggestion! Best Regards!