This feature would be an awesome addition! I'm looking forward to it On Mon, Apr 24, 2023 at 3:59 PM Илья Соин <ilya.soin...@gmail.com> wrote:
> Thank you, Shammon FY > > -- > *Sincerely,* > *Ilya Soin* > > On 24 Apr 2023, at 15:19, Shammon FY <zjur...@gmail.com> wrote: > > > Thanks Илья, there's already a FLIP [1] and discussion thread [2] about > hybrid source. You can follow the progress and welcome to participate in > the discussion. > > [1] > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=235836225 > [2] https://lists.apache.org/thread/nbf3skopy3trtj37jcovmt6ktcgst4w8 > > Best, > Shammon FY > > > On Mon, Apr 24, 2023 at 3:30 PM Илья Соин <ilya.soin...@gmail.com> wrote: > >> Hi Shammon FY, >> >> I haven’t tried it because AFIK it’s only available in the DataStream >> API, while our job is in SQL. I’m thinking to write a custom >> HybridDynamicTableSource which will use HybridSource under the hood. This >> should allow to bootstrap any SQL / Table API job. Maybe it’s something >> worth adding to the Flink distribution? >> >> -- >> *Sincerely,* >> >> *Ilya Soin* >> >> On 24 Apr 2023, at 03:37, Shammon FY <zjur...@gmail.com> wrote: >> >> >> Hi Илья >> >> I think HybridSource may be a good way. Have you tried it before? Or have >> you encountered any problems? >> >> Best, >> Shammon FY >> >> On Fri, Apr 21, 2023 at 5:59 PM Илья Соин <ilya.soin...@gmail.com> wrote: >> >>> Hi Flink community, >>> >>> We have a quite complex sql job, it unions 5 topics, deduplicates by key >>> and does some daily aggregations. The state TTL is 40 days. We want to be >>> able to bootstrap its state from s3 or clickhouse. We want to have a >>> general solution to this, to use for other SQL jobs as well. >>> >>> So far I haven’t found a working solution to this. I’d like to discuss >>> what’s the best approach to take here and possibly contribute in to Flink. >>> >>> I think a good solution would be to bring HybridSource to Table / SQL >>> API. >>> >>> Another thought was to take the SQL, replace unbounded sources with >>> bounded ones, and run the job. Then take a savepoint in the end and use it >>> to bootstrap the streaming job. The problems I see here: >>> - we have no control over operator uuids and the final table plan, it’s >>> possible the plan of the batch job will be slightly different than of the >>> streaming job. >>> >>> >>> -- >>> *Sincerely,* >>> *Ilya Soin* >>> >>