Hi Flink community,

We have a quite complex sql job, it unions 5 topics, deduplicates by key and 
does some daily aggregations. The state TTL is 40 days. We want to be able to 
bootstrap its state from s3 or clickhouse. We want to have a general solution 
to this, to use for other SQL jobs as well. 

So far I haven’t found a working solution to this. I’d like to discuss what’s 
the best approach to take here and possibly contribute in to Flink.

I think a good solution would be to bring HybridSource to Table / SQL API. 

Another thought was to take the SQL, replace unbounded sources with bounded 
ones, and run the job. Then take a savepoint in the end and use it to bootstrap 
the streaming job. The problems I see here:
- we have no control over operator uuids and the final table plan, it’s 
possible the plan of the batch job will be slightly different than of the 
streaming job.


-- 
Sincerely,
Ilya Soin

Reply via email to