Re: [DISCUSS] FLIP-529 Connections in Flink SQL and Table API

Leonard Xu Wed, 04 Jun 2025 01:47:47 -0700

Hey，Mayank

Please see my feedback as following:

1. One of the motivations of this FLIP is to improve security. However, the 
current design stores all connection information in the catalog, 
and each Flink SQL job reads from the catalog during compilation. The 
connection information is passed between SQL Gateway and the 
catalog in plaintext, which actually introduces new security risks.

2. The name "Connection" should be changed to something like ConnectionSpec to 
clearly indicate that it is a object containing only static 
properties without a lifecycle. Putting aside the naming issue, I think the 
current model and hierarchy design is somewhat strange. Storing
various kinds of connections (e.g., Kafka, MySQL) in the same Catalog with 
hierarchical identifiers like catalog-name.db-name.connection-name
raises the following questions:  
 (1) What is the purpose of this hierarchical structure of Connection object ?  
 (2) If we can use a Connection to create a MySQL table, why can't we use a 
Connection to create a MySQL Catalog?  

3. Regarding the connector usage examples given in this FLIP:
```sql
1  -- Example 2: Using connection for jdbc tables 
2  CREATE OR REPLACE CONNECTION mysql_customer_db
3  WITH (
4    'type' = 'jdbc',
5    'jdbc.url' = 'jdbc:mysql://customer-db.example.com:3306/customerdb',
6    'jdbc.connection.ssl.enabled' = 'true'
7  );
8  
9  CREATE TABLE customers (
10   customer_id INT,
11   PRIMARY KEY (customer_id) NOT ENFORCED
12 ) WITH (
13   'connector' = 'jdbc',
14   'jdbc.connection' = 'mysql_customer_db',
15   'jdbc.connection.ssl.enabled' = 'true',
16   'jdbc.connection.max-retry-timeout' = '60s',
17   'jdbc.table-name' = 'customers',
18   'jdbc.lookup.cache' = 'PARTIAL'
19 );
```
I see three issues from SQL semantics and Connector compatibility perspectives:

(1) Look at line 14: `mysql_customer_db` is an object identifier of a 
CONNECTION defined in SQL. However, this identifier is referenced
    via a string value inside the table’s WITH clause, which feel hack for me.

(2) Look at lines 14–16: the use of the specific prefix `jdbc.connection` will 
confuse users because `connection.xx` maybe already used as
 a prefix for existing configuration items.

(3) Look at lines 14–18: Why do all existing configuration options need to be 
prefixed with `jdbc`, even they’re not related to Connection properties?
This completely changes user habits — is it backward compatible?

In my opinion, Connection should be a model independent of both Catalog and 
Table, and can be referenced by all catalog/table/udf/model object. 
It should be managed by a Component such as a ConnectionManager to enable 
reuse. For security purposes, authentication mechanisms could 
be supported within the ConnectionManager.

Best,
Leonard

> 2025 6月 4 02:04，Martijn Visser <martijnvis...@apache.org> 写道：
> 
> Hi all,
> 
> First of all, I think having a Connection resource is something that will
> be beneficial for Apache Flink. I could see that being extended in the
> future to allow for easier secret handling [1].
> In my mental mind, I'm comparing this proposal against SQL/MED from the ISO
> standard [2]. I do think that SQL/MED isn't a very user friendly syntax
> though, looking at Postgres for example [3].
> 
> I think it's a valid question if Connection should be considered with a
> catalog or database-level scope. @Ryan can you share something more, since
> you've mentioned "Note: I much prefer catalogs for this case. Which is what
> we use internally to manage connection properties". It looks like there
> isn't a strong favourable approach looking at other vendors (like,
> Databricks does scopes it on a Unity catalog, Snowflake on a database
> level).
> 
> Also looking forward to Leonard's input.
> 
> Best regards,
> 
> Martijn
> 
> [1] https://issues.apache.org/jira/browse/FLINK-36818
> [2] https://www.iso.org/standard/84804.html
> [3] https://www.postgresql.org/docs/current/sql-createserver.html
> 
> On Fri, May 30, 2025 at 5:07 AM Leonard Xu <xbjt...@gmail.com> wrote:
> 
>> Hey Mayank.
>> 
>> Thanks for the FLIP, I went through this FLIP quickly and found some
>> issues which I think we
>> need to deep discuss later. As we’re on a short Dragon boat Festival,
>> could you kindly hold
>> on this thread? and we will back to continue the FLIP discuss.
>> 
>> Best,
>> Leonard
>> 
>> 
>>> 2025 4月 29 23:07，Mayank Juneja <mayankjunej...@gmail.com> 写道：
>>> 
>>> Hi all,
>>> 
>>> I would like to open up for discussion a new FLIP-529 [1].
>>> 
>>> Motivation:
>>> Currently, Flink SQL handles external connectivity by defining endpoints
>>> and credentials in table configuration. This approach prevents
>> reusability
>>> of these connections and makes table definition less secure by exposing
>>> sensitive information.
>>> We propose the introduction of a new "connection" resource in Flink. This
>>> will be a pluggable resource configured with a remote endpoint and
>>> associated access key. Once defined, connections can be reused across
>> table
>>> definitions, and eventually for model definition (as discussed in
>> FLIP-437)
>>> for inference, enabling seamless and secure integration with external
>>> systems.
>>> The connection resource will provide a new, optional way to manage
>> external
>>> connectivity in Flink. Existing methods for table definitions will remain
>>> unchanged.
>>> 
>>> [1] https://cwiki.apache.org/confluence/x/cYroF
>>> 
>>> Best Regards,
>>> Mayank Juneja
>> 
>>

Re: [DISCUSS] FLIP-529 Connections in Flink SQL and Table API

Reply via email to