Quick response, thanks Mayank, Hao and Timo for the effort. The new proposal looks well, +1 from my side.
Could you draft(update) current FLIP docs thus we can have some specific discussions later? Best, Leonard > 2025 6月 26 15:06,Timo Walther <twal...@apache.org> 写道: > > Hi everyone, > > sorry for the late reply, feature freeze kept me busy. Mayank, Hao and I > synced offline and came up we an improved proposal. Before we update the FLIP > let me summarize the most important key facts that hopefully address most > concerns: > > 1) SecretStore > - Similar to CatalogStore, we introduce a SecretStore as the highest level in > TableEnvironment. > - SecretStore is initialized with options and potentially environment > variables. Including EnvironmentSettings.withSecretStore(SecretStore). > - The SecretStore is pluggable and discovered using the regular > factory-approach. > - For example, it could implement Azure Key Vault or other cloud provider > secrets stores. > - Goal: Flink and Flink catalogs do not have to deal with sensitive data. > > 2) Connections > - Connections are catalog objects identified with 3-part identifiers. 3-part > identifiers are crucial for managability of larger projects and align with > existing catalog objects. > - They contain connection details, e.g. URL, query parameters, and other > configuration. > - They do not contain secrets, but only pointers to secrets in the > SecretStore. > > 3) Connection DDL > > CREATE [TEMPORARY] CONNECTION mycat.mydb.OpenAPI WITH ( > 'type' = 'basic' | 'bearer' | 'jwt' | 'oauth' | ..., > ... > ) > > - Connection type is pluggable and discovered using the regular > factory-approach. > - The factory extracts secrets and puts them into SecretStore. > - The factory only leaves non-confidential options left that can be stored in > a catalog. > > When executing: > CREATE [TEMPORARY] CONNECTION mycat.mydb.OpenAPI WITH ( > 'type' = 'basic', > 'url' = 'api.example.com', > 'username' = 'bob', > 'password' = 'xyz' > ) > > The catalog will receive something similar to: > CREATE [TEMPORARY] CONNECTION mycat.mydb.OpenAPI WITH ( > 'type' = 'basic', > 'url' = 'api.example.com', > 'secret.store' = 'azure-key-vault' > 'secret.id' = 'secretId' > ) > > - However, the exact property design is up to the connection factory. > > 4) Connection Usage > > CREATE TABLE t (...) USING CONNECTION mycat.mydb.OpenAPI; > > - MODEL, FUNCTION, TABLE DDL will support USING CONNECTION keyword similar to > BigQuery. > - The connection will be provided in a table/model provider/function > definition factory. > > 5) CatalogStore / Catalog Initialization > > Catalog store or catalog can make use of SecretStore to retrieve initial > credentials for bootstrapping. All objects lower then catalog store/catalog > can then use connections. If you think we still need system level > connections, we can support CREATE SYSTEM CONNECTION GlobalName WITH (..) > similar to SYSTEM functions directly store in a ConnectioManager in > TableEnvironment. But for now I would suggest to start simple with > per-catalog connections and later evolve the design. > > Dealing with secrets is a very sensitive topic and I'm clearly not an expert > on it. This is why we should try to push the problem to existing solutions > and don't start storing secrets in Flink in any way. Thus, the interfaces > will be defined very generic. > > Looking forward to your feedback. > > Cheers, > Timo > > > > > > On 09.06.25 04:01, Leonard Xu wrote: >> Thanks Timo for joining this thread. >> I agree that this feature is needed by the community; the current >> disagreement is only about the implementation method or solution. >> Your thoughts looks generally good to me, looking forward to your proposal. >> Best, >> Leonard >>> 2025 6月 6 22:46,Timo Walther <twal...@apache.org> 写道: >>> >>> Hi everyone, >>> >>> thanks for this healthy discussion. Looking at high number of participants, >>> it looks like we definitely want this feature. We just need to figure out >>> the "how". >>> >>> This reminds me very much of the discussion we had for CREATE FUNCTION. >>> There, we discussed whether functions should be named globally or >>> catalog-specific. In the end, we decided for both `CREATE SYSTEM FUNCTION` >>> and `CREATE FUNCTION`, satisfying both the data platform team of an >>> organization (which might provide system functions) and individual data >>> teams or use cases (scoped by catalog/database). >>> >>> Looking at other modern vendors like Snowflake there is SECRET (scoped to >>> schema) [1] and API INTEGRATION [2] (scoped to account). So also other >>> vendors offer global and per-team / per-use case connections details. >>> >>> In general, I think fitting connections into the existing concepts for >>> catalog objects (with three-part identifier) makes managing them easier. >>> But I also see the need for global defaults. >>> >>> Btw keep in mind that a catalog implementation should only store metadata. >>> Similar how a CatalogTable doesn't store the actual data, a >>> CatalogConnection should not store the credentials. It should only offer a >>> factory that allows for storing and retrieving them. In real world >>> scenarios a factory is most likely backed by a product like Azure Key Vault. >>> >>> So code-wise having a ConnectionManager that behaves similar to >>> FunctionManager sounds reasonable. >>> >>> +1 for having special syntax instead of using properties. This allows to >>> access connections in tables, models, functions. And catalogs, if we agree >>> to have global ones as well. >>> >>> What do you think? >>> >>> Let me spend some more thoughts on this and come back with a concrete >>> proposal by early next week. >>> >>> Cheers, >>> Timo >>> >>> [1] https://docs.snowflake.com/en/sql-reference/sql/create-secret >>> [2] https://docs.snowflake.com/en/sql-reference/sql/create-api-integration >>> >>> On 04.06.25 10:47, Leonard Xu wrote: >>>> Hey,Mayank >>>> Please see my feedback as following: >>>> 1. One of the motivations of this FLIP is to improve security. However, >>>> the current design stores all connection information in the catalog, >>>> and each Flink SQL job reads from the catalog during compilation. The >>>> connection information is passed between SQL Gateway and the >>>> catalog in plaintext, which actually introduces new security risks. >>>> 2. The name "Connection" should be changed to something like >>>> ConnectionSpec to clearly indicate that it is a object containing only >>>> static >>>> properties without a lifecycle. Putting aside the naming issue, I think >>>> the current model and hierarchy design is somewhat strange. Storing >>>> various kinds of connections (e.g., Kafka, MySQL) in the same Catalog with >>>> hierarchical identifiers like catalog-name.db-name.connection-name >>>> raises the following questions: >>>> (1) What is the purpose of this hierarchical structure of Connection >>>> object ? >>>> (2) If we can use a Connection to create a MySQL table, why can't we use >>>> a Connection to create a MySQL Catalog? >>>> 3. Regarding the connector usage examples given in this FLIP: >>>> ```sql >>>> 1 -- Example 2: Using connection for jdbc tables >>>> 2 CREATE OR REPLACE CONNECTION mysql_customer_db >>>> 3 WITH ( >>>> 4 'type' = 'jdbc', >>>> 5 'jdbc.url' = 'jdbc:mysql://customer-db.example.com:3306/customerdb', >>>> 6 'jdbc.connection.ssl.enabled' = 'true' >>>> 7 ); >>>> 8 >>>> 9 CREATE TABLE customers ( >>>> 10 customer_id INT, >>>> 11 PRIMARY KEY (customer_id) NOT ENFORCED >>>> 12 ) WITH ( >>>> 13 'connector' = 'jdbc', >>>> 14 'jdbc.connection' = 'mysql_customer_db', >>>> 15 'jdbc.connection.ssl.enabled' = 'true', >>>> 16 'jdbc.connection.max-retry-timeout' = '60s', >>>> 17 'jdbc.table-name' = 'customers', >>>> 18 'jdbc.lookup.cache' = 'PARTIAL' >>>> 19 ); >>>> ``` >>>> I see three issues from SQL semantics and Connector compatibility >>>> perspectives: >>>> (1) Look at line 14: `mysql_customer_db` is an object identifier of a >>>> CONNECTION defined in SQL. However, this identifier is referenced >>>> via a string value inside the table’s WITH clause, which feel hack for >>>> me. >>>> (2) Look at lines 14–16: the use of the specific prefix `jdbc.connection` >>>> will confuse users because `connection.xx` maybe already used as >>>> a prefix for existing configuration items. >>>> (3) Look at lines 14–18: Why do all existing configuration options need to >>>> be prefixed with `jdbc`, even they’re not related to Connection properties? >>>> This completely changes user habits — is it backward compatible? >>>> In my opinion, Connection should be a model independent of both Catalog >>>> and Table, and can be referenced by all catalog/table/udf/model object. >>>> It should be managed by a Component such as a ConnectionManager to enable >>>> reuse. For security purposes, authentication mechanisms could >>>> be supported within the ConnectionManager. >>>> Best, >>>> Leonard >>>>> 2025 6月 4 02:04,Martijn Visser <martijnvis...@apache.org> 写道: >>>>> >>>>> Hi all, >>>>> >>>>> First of all, I think having a Connection resource is something that will >>>>> be beneficial for Apache Flink. I could see that being extended in the >>>>> future to allow for easier secret handling [1]. >>>>> In my mental mind, I'm comparing this proposal against SQL/MED from the >>>>> ISO >>>>> standard [2]. I do think that SQL/MED isn't a very user friendly syntax >>>>> though, looking at Postgres for example [3]. >>>>> >>>>> I think it's a valid question if Connection should be considered with a >>>>> catalog or database-level scope. @Ryan can you share something more, since >>>>> you've mentioned "Note: I much prefer catalogs for this case. Which is >>>>> what >>>>> we use internally to manage connection properties". It looks like there >>>>> isn't a strong favourable approach looking at other vendors (like, >>>>> Databricks does scopes it on a Unity catalog, Snowflake on a database >>>>> level). >>>>> >>>>> Also looking forward to Leonard's input. >>>>> >>>>> Best regards, >>>>> >>>>> Martijn >>>>> >>>>> [1] https://issues.apache.org/jira/browse/FLINK-36818 >>>>> [2] https://www.iso.org/standard/84804.html >>>>> [3] https://www.postgresql.org/docs/current/sql-createserver.html >>>>> >>>>> On Fri, May 30, 2025 at 5:07 AM Leonard Xu <xbjt...@gmail.com> wrote: >>>>> >>>>>> Hey Mayank. >>>>>> >>>>>> Thanks for the FLIP, I went through this FLIP quickly and found some >>>>>> issues which I think we >>>>>> need to deep discuss later. As we’re on a short Dragon boat Festival, >>>>>> could you kindly hold >>>>>> on this thread? and we will back to continue the FLIP discuss. >>>>>> >>>>>> Best, >>>>>> Leonard >>>>>> >>>>>> >>>>>>> 2025 4月 29 23:07,Mayank Juneja <mayankjunej...@gmail.com> 写道: >>>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> I would like to open up for discussion a new FLIP-529 [1]. >>>>>>> >>>>>>> Motivation: >>>>>>> Currently, Flink SQL handles external connectivity by defining endpoints >>>>>>> and credentials in table configuration. This approach prevents >>>>>> reusability >>>>>>> of these connections and makes table definition less secure by exposing >>>>>>> sensitive information. >>>>>>> We propose the introduction of a new "connection" resource in Flink. >>>>>>> This >>>>>>> will be a pluggable resource configured with a remote endpoint and >>>>>>> associated access key. Once defined, connections can be reused across >>>>>> table >>>>>>> definitions, and eventually for model definition (as discussed in >>>>>> FLIP-437) >>>>>>> for inference, enabling seamless and secure integration with external >>>>>>> systems. >>>>>>> The connection resource will provide a new, optional way to manage >>>>>> external >>>>>>> connectivity in Flink. Existing methods for table definitions will >>>>>>> remain >>>>>>> unchanged. >>>>>>> >>>>>>> [1] https://cwiki.apache.org/confluence/x/cYroF >>>>>>> >>>>>>> Best Regards, >>>>>>> Mayank Juneja >>>>>> >>>>>> >>> >