Re: State bootstrapping for Flink SQL / Table API jobs

Flavio Pompermaier Wed, 26 Apr 2023 01:17:28 -0700

This feature would be an awesome addition! I'm looking forward to it

On Mon, Apr 24, 2023 at 3:59 PM Илья Соин <ilya.soin...@gmail.com> wrote:


> Thank you, Shammon FY
>
> --
> *Sincerely,*
> *Ilya Soin*
>
> On 24 Apr 2023, at 15:19, Shammon FY <zjur...@gmail.com> wrote:
>
> 
> Thanks Илья, there's already a FLIP [1] and discussion thread [2] about
> hybrid source. You can follow the progress and welcome to participate in
> the discussion.
>
> [1]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=235836225
> [2] https://lists.apache.org/thread/nbf3skopy3trtj37jcovmt6ktcgst4w8
>
> Best,
> Shammon FY
>
>
> On Mon, Apr 24, 2023 at 3:30 PM Илья Соин <ilya.soin...@gmail.com> wrote:
>
>> Hi Shammon FY,
>>
>> I haven’t tried it because AFIK it’s only available in the DataStream
>> API, while our job is in SQL. I’m thinking to write a custom
>> HybridDynamicTableSource which will use HybridSource under the hood. This
>> should allow to bootstrap any SQL / Table API job. Maybe it’s something
>> worth adding to the Flink distribution?
>>
>> --
>> *Sincerely,*
>>
>> *Ilya Soin*
>>
>> On 24 Apr 2023, at 03:37, Shammon FY <zjur...@gmail.com> wrote:
>>
>> 
>> Hi Илья
>>
>> I think HybridSource may be a good way. Have you tried it before? Or have
>> you encountered any problems?
>>
>> Best,
>> Shammon FY
>>
>> On Fri, Apr 21, 2023 at 5:59 PM Илья Соин <ilya.soin...@gmail.com> wrote:
>>
>>> Hi Flink community,
>>>
>>> We have a quite complex sql job, it unions 5 topics, deduplicates by key
>>> and does some daily aggregations. The state TTL is 40 days. We want to be
>>> able to bootstrap its state from s3 or clickhouse. We want to have a
>>> general solution to this, to use for other SQL jobs as well.
>>>
>>> So far I haven’t found a working solution to this. I’d like to discuss
>>> what’s the best approach to take here and possibly contribute in to Flink.
>>>
>>> I think a good solution would be to bring HybridSource to Table / SQL
>>> API.
>>>
>>> Another thought was to take the SQL, replace unbounded sources with
>>> bounded ones, and run the job. Then take a savepoint in the end and use it
>>> to bootstrap the streaming job. The problems I see here:
>>> - we have no control over operator uuids and the final table plan, it’s
>>> possible the plan of the batch job will be slightly different than of the
>>> streaming job.
>>>
>>>
>>> --
>>> *Sincerely,*
>>> *Ilya Soin*
>>>
>>

Re: State bootstrapping for Flink SQL / Table API jobs

Reply via email to