Hi Flink community, We have a quite complex sql job, it unions 5 topics, deduplicates by key and does some daily aggregations. The state TTL is 40 days. We want to be able to bootstrap its state from s3 or clickhouse. We want to have a general solution to this, to use for other SQL jobs as well.
So far I haven’t found a working solution to this. I’d like to discuss what’s the best approach to take here and possibly contribute in to Flink. I think a good solution would be to bring HybridSource to Table / SQL API. Another thought was to take the SQL, replace unbounded sources with bounded ones, and run the job. Then take a savepoint in the end and use it to bootstrap the streaming job. The problems I see here: - we have no control over operator uuids and the final table plan, it’s possible the plan of the batch job will be slightly different than of the streaming job. -- Sincerely, Ilya Soin