Re: [DISCUSS] FLIP-496: SQL connector for keyed state data

Zakelly Lan Wed, 18 Dec 2024 19:10:53 -0800

Hi Gabor,

Thanks for your answers!


I'm good with naming it 'savepoint' with an extra option like 'mode'
specifying the reading type, or suffixes in names like 'savepoint-metadata'
for each type of state. For the second option it means the current
connector will be named 'savepoint-keyed'? Well I slightly prefer a short
name.

For connector='metadata', it seems unclear to me (what is metadata of Flink
:D), so I don't recommend this option.


Best,
Zakelly


On Wed, Dec 18, 2024 at 10:35 PM Gabor Somogyi <gabor.g.somo...@gmail.com>
wrote:

> Hi Zakelly,
>
> Thanks for your comments, please find my answers inline.
>
> BR,
> G
>
>
> On Wed, Dec 18, 2024 at 2:42 PM Zakelly Lan <zakelly....@gmail.com> wrote:
>
> > Hi Gabor,
> >
> > Thanks for the FLIP! It's very convenient for users to use SQL to read
> > checkpoint data, so overall I'd +1 for this FLIP. Here are my
> > thoughts/questions:
> >
> >    1. I see this connector is designed to read the checkpoint/savepoint
> >    data, so I'd suggest naming this connector as 'checkpoint' or
> > 'savepoint',
> >    which is more intuitive. The "connector='state'" makes me think this
> is
> >    reading the runtime state. Also the 'state.path' can be changed to
> >    'checkpoint.path'.
> >
> Rename is a possibility, I'm fine with either way in terms of naming.
> The only thing what I would consider is that later on it would be good
> to add metadata reading possibility. Such case we can either provide
> "savepoint" for data reading and "savepoint_metadata" or simply "metadata"
> for meta reading (See 3rd bullet for another approach). Please share your
> thoughts on this.
>
>
> >    2. IIUC this connector is a bounded source and batch execution is
> >    preferred, right? And it cannot read checkpoints while the original
> job
> > is
> >    running and doing periodic checkpoints. Thus I'd prefer the name
> >    'savepoint', as the lifecycle of savepoints is managed by the user
> > instead
> >    of the Flink job.
> >
> We can call it savepoint, just agree on the metadata naming as well to have
> an overall picture.
>
>
> >    3. As you pictured, there might be more connectors for states other
> than
> >    keyed state, so I'm thinking about providing them in one connector
> with
> > an
> >    option 'state.type' enumerating 'keyed',  'non-keyed' ('operator'),
> and
> > so
> >    on. Well for the first version this is not a required option. Seems
> more
> >    organized. WDYT?
> >
> Yeah, later on we must provide such config parameter somehow.
> Couple of things to consider:
> - Maybe we can add "metadata" as possible value here instead of
> "connector=X"
> - I'm thinking about whether we can come up with something
> where we can mix metadata, keyed and non-keyed data in a single SQL
> statement.
> Without super deep consideration I would say no, but I'm going to think it
> through.
>
>
> >    4. I'm not sure if the general option 'key.format' is needed, would
> >    using 'fields.k.format' while the 'k' is the name of the SQL primary
> > column
> >    be feasible?
> >
> I think that's a nice catch of a simplification.
>
>
> >    5. Reading window state[1] is unsupported, right?
> >
> Yeah, not yet covered here and can come in a later FLIP.
>
>
> >
> >
> > [1]
> >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/libs/state_processor_api/#window-state
> >
> > Best,
> > Zakelly
> >
> > On Wed, Dec 18, 2024 at 5:46 PM Gabor Somogyi <gabor.g.somo...@gmail.com
> >
> > wrote:
> >
> > > Hi All,
> > >
> > > I'd like to start a discussion of FLIP-496: SQL connector for keyed
> state
> > > data [1].
> > > Feel free to add your thoughts to make this feature better.
> > >
> > > [1]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-496%3A+SQL+connector+for+keyed+state+data
> > >
> > > BR,
> > > G
> > >
> >
>

Re: [DISCUSS] FLIP-496: SQL connector for keyed state data

Reply via email to