Hi Gabor,
Thanks for the FLIP! It's very convenient for users to use SQL to read
checkpoint data, so overall I'd +1 for this FLIP. Here are my
thoughts/questions:
1. I see this connector is designed to read the checkpoint/savepoint
data, so I'd suggest naming this connector as 'checkpoint' or 'savepoint',
which is more intuitive. The "connector='state'" makes me think this is
reading the runtime state. Also the 'state.path' can be changed to
'checkpoint.path'.
2. IIUC this connector is a bounded source and batch execution is
preferred, right? And it cannot read checkpoints while the original job is
running and doing periodic checkpoints. Thus I'd prefer the name
'savepoint', as the lifecycle of savepoints is managed by the user instead
of the Flink job.
3. As you pictured, there might be more connectors for states other than
keyed state, so I'm thinking about providing them in one connector with an
option 'state.type' enumerating 'keyed', 'non-keyed' ('operator'), and so
on. Well for the first version this is not a required option. Seems more
organized. WDYT?
4. I'm not sure if the general option 'key.format' is needed, would
using 'fields.k.format' while the 'k' is the name of the SQL primary column
be feasible?
5. Reading window state[1] is unsupported, right?
[1]
https://nightlies.apache.org/flink/flink-docs-master/docs/libs/state_processor_api/#window-state
Best,
Zakelly
On Wed, Dec 18, 2024 at 5:46 PM Gabor Somogyi <[email protected]>
wrote:
> Hi All,
>
> I'd like to start a discussion of FLIP-496: SQL connector for keyed state
> data [1].
> Feel free to add your thoughts to make this feature better.
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-496%3A+SQL+connector+for+keyed+state+data
>
> BR,
> G
>