Thanks Feng for driving this FLIP. I have few comments: 1: The mechanism of handling catalog with the same name looks a little of complex to me. I think it'll be better to explain it in the java doc of these methods and give a brief example in this FLIP.
2: TBH, the method name `addCatalog` still looks confused to me. IIUC, this method is for storing catalog to CatalogStore, how about renaming it to `storeCatalog`? It's very personal opinion, you can decide to take it or not by your self. 3: For CREATE CATALOG statement, which method will be called? `registerCatalog` or `addCatalog`? I'm wondering whether user can add a catalog to store with SQL stement. 3: Is it really neccessary to provide a default implmentation for interface `CatalogStoreFactory`? 4: About asynchronous registration for catalog. > When creating a catalog with CREATE CATALOG, the asynchronous registration > method is used by default. If asynchronous registration is the default behavior, it there any way that user can switch to synchronous registration just like before? Will both method `addCatalog` and `registerCatalog` be asynchronous registration? IIUC, in asynchronous registration, it may well that CREATE CATALOG executes successfully, but then the following CREATE TABLE statement will fail for the catalog fail to open. I think it's a break change which should be highlighted in this FLIP, may be in compatibility part. BTW, by saying asynchronous registration, I would like to expect there will be an executor to open or register catalog in the background, but from your previous comments, "the Catalog instance will be initialized if it has not been initialized yet. If the initialization process fails, these statements will not be executed successfully." It looks more like lazy initialization for catalog than asynchronous registration, right? Best regards, Yuxia ----- 原始邮件 ----- 发件人: "Feng Jin" <jinfeng1...@gmail.com> 收件人: "dev" <dev@flink.apache.org> 发送时间: 星期一, 2023年 4 月 03日 下午 3:27:45 主题: Re: [DISCUSS] FLIP 295: Support persistence of Catalog configuration and asynchronous registration Hi everyone, Thank you all for your interest in this DISCUSS. @Shammon > How to handle a catalog with the same name that exists for both of them? I believe this is a crucial point. Based on my current personal understanding, the Map<String, Catalog> catalogs will serve as a cache for instantiated catalogs and have the highest priority. There are three methods that can affect the Map<String, Catalog> catalogs: 1. registerCatalog(String catalogName, Catalog catalog) This method puts the catalog instance into the Map<String, Catalog> catalogs. 2. unregisterCatalog(String catalogName) This method removes the catalog instance corresponding to catalogName from the Map<String, Catalog> catalogs. 3. getCatalog(String catalogName) This method first retrieves the corresponding catalog instance from the Map<String, Catalog> catalogs. If the catalog does not exist, it retrieves the corresponding configuration from the CatalogStore, initializes it, and puts the initialized Catalog instance into the Map<String, Catalog> catalogs. The following two methods only modify the configuration in the CatalogStore: 1. addCatalog(String catalogName, Map<String, String> properties) This method saves the properties to the catalogStore and checks whether there is a catalogName with the same name. 2. removeCatalog(String catalogName) This method removes the specified configuration of the specified catalogName in the catalogStore. The following are possible conflict scenarios: 1. When the corresponding catalogName already exists in the CatalogStore but not in the Map<String, Catalog>, the registerCatalog(String catalogName, Catalog catalog) method can succeed and be directly saved to the Map<String, Catalog> catalogs. 2. When the corresponding catalogName already exists in both the CatalogStore and the Map<String, Catalog>, the registerCatalog(String catalogName, Catalog catalog) method will fail. 3. When the corresponding catalogName already exists in the Map<String, Catalog>, the addCatalog(String catalogName, Map<String, String> properties) method can directly save the properties to the catalogStore, but the getCatalog(String catalogName) method will not use the new properties for initialization because the corresponding catalog instance already exists in catalogs and will be prioritized. Therefore, using the unregisterCatalog(String catalogName) method to remove the instance corresponding to the original catalogName is necessary. > I think it will confuse users that `registerCatalog(String > catalogName,Catalog catalog)` in the `Map<String, Catalog> catalogs` and > `registerCatalog(String catalogName, Map<String, String> properties) This could potentially lead to confusion. I suggest changing the method name, perhaps to addCatalog(String catalogName, Map<String, String> properties), as previously mentioned @Hang > add `registerCatalog(String catalogName,Catalog catalog, boolean lazyInit)` method Since a catalog is already an instance, adding the "lazyInit" parameter to the registerCatalog(String catalogName, Catalog catalog) method may not necessarily result in lazy initialization. > Do we need to think about encryption I think encryption is necessary, but perhaps the encryption logic should be implemented in the CatalogStore when the data is actually saved. For instance, the FileCatalogStore could encrypt the data when it is saved to a file, while the MemoryCatalogStore does not require encryption. > Do we really need the `MemoryCatalogStore`? I think it is necessary to have a MemoryCatalogStore as the default implementation, which saves Catalog configurations in memory. Otherwise, if we want to implement asynchronous loading of Catalog, we would need to introduce additional cache. @Xianxun > What if asynchronous registration failed? This is also a critical concern. When executing DDL, DQL, or DML statements that reference a specific Catalog, the Catalog instance will be initialized if it has not been initialized yet. If the initialization process fails, these statements will not be executed successfully. > nt. The Map<String, Catalog> catalogs and CatalogStore catalogstore in the > CatalogManager all have the same catalog name, but correponding to different > catalog instances. This issue can be resolved by referring to the previous responses. The key principle is that the Catalog that has been most recently used will be given priority for subsequent use. On Sun, Apr 2, 2023 at 10:58 PM Xianxun Ye <yesorno828...@gmail.com> wrote: > > Hi Feng, > > Thanks for driving this Flip, I do believe this Flip could be helpful for > users. The firm I work for also manages a lot of catalogs, and the submission > of tasks becomes slow because of loading a number of catalogs. We obtain the > catalogs in the user's Flink SQL through regular expression to avoid loading > all catalogs to improve the speed of task submission. After reading flip-295, > I have some questions: > > 1. When creating a catalog with CREATE CATALOG, the asynchronous registration > method is used by default. What if asynchronous registration failed? And how > to implement asynchronous registration? > > 2. I also have the same question with Shammon’s first comment. The > Map<String, Catalog> catalogs and CatalogStore catalogstore in the > CatalogManager all have the same catalog name, but correponding to different > catalog instances. For example: catalog1-> jdbcCatalog in the catalogs, but > catalog1 -> hiveCatalog in the catalogstore. Would this case happen? Maybe we > should define `catalogs` and `catalogstore` more clearly. > > > Best regards, > > Xianxun > > > 2023年3月31日 11:13,Hang Ruan <ruanhang1...@gmail.com> 写道: > > > > MemoryCatalogStore > >