Re: [DISCUSS] FLIP-496: SQL connector for keyed state data

Gabor Somogyi Wed, 18 Dec 2024 06:35:58 -0800

Hi Zakelly,

Thanks for your comments, please find my answers inline.


BR,
G


On Wed, Dec 18, 2024 at 2:42 PM Zakelly Lan <[email protected]> wrote:

> Hi Gabor,
>
> Thanks for the FLIP! It's very convenient for users to use SQL to read
> checkpoint data, so overall I'd +1 for this FLIP. Here are my
> thoughts/questions:
>
>    1. I see this connector is designed to read the checkpoint/savepoint
>    data, so I'd suggest naming this connector as 'checkpoint' or
> 'savepoint',
>    which is more intuitive. The "connector='state'" makes me think this is
>    reading the runtime state. Also the 'state.path' can be changed to
>    'checkpoint.path'.
>
Rename is a possibility, I'm fine with either way in terms of naming.
The only thing what I would consider is that later on it would be good
to add metadata reading possibility. Such case we can either provide
"savepoint" for data reading and "savepoint_metadata" or simply "metadata"
for meta reading (See 3rd bullet for another approach). Please share your
thoughts on this.


>    2. IIUC this connector is a bounded source and batch execution is
>    preferred, right? And it cannot read checkpoints while the original job
> is
>    running and doing periodic checkpoints. Thus I'd prefer the name
>    'savepoint', as the lifecycle of savepoints is managed by the user
> instead
>    of the Flink job.
>
We can call it savepoint, just agree on the metadata naming as well to have
an overall picture.


>    3. As you pictured, there might be more connectors for states other than
>    keyed state, so I'm thinking about providing them in one connector with
> an
>    option 'state.type' enumerating 'keyed',  'non-keyed' ('operator'), and
> so
>    on. Well for the first version this is not a required option. Seems more
>    organized. WDYT?
>
Yeah, later on we must provide such config parameter somehow.
Couple of things to consider:
- Maybe we can add "metadata" as possible value here instead of
"connector=X"
- I'm thinking about whether we can come up with something
where we can mix metadata, keyed and non-keyed data in a single SQL
statement.
Without super deep consideration I would say no, but I'm going to think it
through.


>    4. I'm not sure if the general option 'key.format' is needed, would
>    using 'fields.k.format' while the 'k' is the name of the SQL primary
> column
>    be feasible?
>
I think that's a nice catch of a simplification.


>    5. Reading window state[1] is unsupported, right?
>
Yeah, not yet covered here and can come in a later FLIP.


>
>
> [1]
>
> https://nightlies.apache.org/flink/flink-docs-master/docs/libs/state_processor_api/#window-state
>
> Best,
> Zakelly
>
> On Wed, Dec 18, 2024 at 5:46 PM Gabor Somogyi <[email protected]>
> wrote:
>
> > Hi All,
> >
> > I'd like to start a discussion of FLIP-496: SQL connector for keyed state
> > data [1].
> > Feel free to add your thoughts to make this feature better.
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-496%3A+SQL+connector+for+keyed+state+data
> >
> > BR,
> > G
> >
>

Re: [DISCUSS] FLIP-496: SQL connector for keyed state data

Reply via email to