Re: Re: how to run streaming process after batch process is completed?

Yun Gao Thu, 02 Dec 2021 00:24:20 -0800

Hi Vtygoss,

Very thanks for sharing the scenarios!


Currently for batch mode checkpoint is not support, thus it could not
create a snapshot after the job is finished. However, there might be some
alternative solutions:

1. Hybrid source [1] targets at allowing first read from a bounded source, then 
switch
to an unbounded source, which seems to work in this case. however, currently it 
might not
support the table / sql yet, which might be done in 1.15. 
2. The batch job might first write the result to an intermediate table, then 
for the unbounded
stream job, it might first load the table into state with DataStream API on 
startup or use dimension
join to continue processing new records. 

Best,
Yun

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-150%3A+Introduce+Hybrid+Source
 ------------------Original Mail ------------------
Sender:vtygoss <vtyg...@126.com>
Send Date:Wed Dec 1 17:52:17 2021
Recipients:Alexander Preuß <alexanderpre...@ververica.com>
CC:user@flink.apache.org <user@flink.apache.org>
Subject:Re: how to run streaming process after batch process is completed?

Hi Alexander,

This is my ideal data pipeline. 
- 1. Sqoop transfer bounded data from database to hive. And I think flink batch 
process is more efficient than streaming process, so i want to process this 
bounded data in batch mode and write result in HiveTable2. 
- 2. There ares some tools to transfer CDC / BINLOG to kafka, and to write 
incremental unbounded data in HiveTable1.  I want to process this unbounded 
data in streaming mode and update incremental result in HiveTable2. 

So this is the problem. The flink streaming sql application cannot be restored 
from  batch process application. e.g. SQL: insert into table_2 select count(1) 
from table_1. In batch mode, the result stored in table_2 is N. And i expect 
that the accumulator number starts from N, not 0 when streaming process started.

Thanks for your reply. 

Best Regard!

(sending again because I accidentally left out the user ml in the reply on the 
first try)...
在 2021年11月30日 21:42，Alexander Preuß<alexanderpre...@ververica.com> 写道：
Hi Vtygoss, 

Can you explain a bit more about your ideal pipeline? Is the batch data bounded 
data or could you also process it in streaming execution mode? And is the 
streaming data derived from the batch data or do you just want to ensure that 
the batch has been finished before running the processing of the streaming data?

Best Regards,
Alexander

(sending again because I accidentally left out the user ml in the reply on the 
first try)
On Tue, Nov 30, 2021 at 12:38 PM vtygoss <vtyg...@126.com> wrote:

Hi, community!

By Flink, I want to unify batch process and streaming process in data 
production pipeline. Batch process is used to process inventory data, then 
streaming process is used to process incremental data. But I meet a problem, 
there is no  state in batch and the result is error if i run stream process 
directly. 

So how to run streaming process accurately  after batch process is completed?   
Is there any doc or demo to handle this scenario?

Thanks for your any reply or suggestion!

Best Regards!

Re: Re: how to run streaming process after batch process is completed?

Reply via email to