Hi Feng Thanks for your update.
I found there are two options in `Proposed Changes`, can you put the unselected option in `Rejected Alternatives`? I think this may help us better understand your proposal Best, Shammon FY On Thu, Apr 13, 2023 at 4:49 AM Jing Ge <j...@ververica.com.invalid> wrote: > Hi Feng, > > Thanks for raising this FLIP. I am still confused after completely reading > the thread with following questions: > > 1. Naming confusion - registerCatalog() and addCatalog() have no big > difference based on their names. One of them is responsible for data > persistence. How about persistCatalog()? > 2. As you mentioned that Map<String, Catalog> catalogs is used as a cache > and catalogStore is used for data persistence. I would suggest describing > their purpose conceptually and clearly in the FLIP. Some common cache > features should be implemented, i.e. data in the cache and the store should > be consistent. Same Catalog instance should be found in the store and in > the cache(either it has been initialized or it will be lazy initialized) > for the same catalog name. The consistency will be taken care of while > updating the catalog. > 3. As the above discussion moves forward, the option 2 solution looks more > like a replacement of option 1, because, afaiu, issues mentioned > previously with option 1 are not solved yet. Do you still want to propose > both options and ask for suggestions for both of them? > 4. After you updated the FLIP, there are some inconsistent descriptions in > the content. Would you like to clean them up? Thanks! > > Best regards, > Jing > > > On Fri, Apr 7, 2023 at 9:24 AM Feng Jin <jinfeng1...@gmail.com> wrote: > > > hi Shammon > > > > Thank you for your response, and I completely agree with your point of > > view. > > Initially, I may have over complicated the whole issue. First and > foremost, > > we need to consider the persistence of the Catalog's Configuration. > > If we only need to provide persistence for Catalog Configuration, we can > > add a toConfiguration method to the Catalog interface. > > This method can convert a Catalog instance to a Map<String, String> > > properties, and the default implementation will throw an exception. > > > > public interface Catalog { > > /** > > * Returns a map containing the properties of the catalog object. > > * > > * @return a map containing the properties of the catalog object > > * @throws UnsupportedOperationException if the implementing class does > > not override > > * the default implementation of this method > > */ > > default Map<String, String> toProperties() { > > throw new UnsupportedOperationException("Please implement > toProperties > > for this catalog"); > > } > > } > > > > The specific process is as follows: > > > > 1. If the user has configured a CatalogStore, the toConfiguration() > method > > will be called when registering a Catalog instance with > > registerCatalog(String catalogName, Catalog catalog). The Catalog > instance > > will be converted to a Map<String, String> properties and saved to the > > CatalogStore. If some Catalog instances do not implement this method, an > > exception will be thrown. > > > > 2. If the user does not use a CatalogStore, the toConfiguration() method > > will not be called, ensuring consistency with the original process. > > > > 3. Saving both the Map<String, Catalog> catalogs and the CatalogStore at > > the same time can also avoid conflicts > > > > > > For lazy initialization: > > we can start from the Catalog itself and provide a dedicated Catalog > > implementation for lazy loading, such as ConfigurationCatalog > > > > public class ConfigurationCatalog implements Catalog { > > ConfigurationCatalog(Map<String, String> properties) { > > } > > } > > > > I have added this design to the FLIP. > > > > > > Best, > > Feng > > > > On Thu, Apr 6, 2023 at 10:03 AM Shammon FY <zjur...@gmail.com> wrote: > > > > > Hi Feng > > > > > > Thanks for your answer. > > > > > > I have no questions about the functionality of `CatalogStore`, but the > > > different operations of multiple `registerCatalog` or `storeCatalog` in > > > `CatalogManager`. The implementation in Trino is also the same, the > > > `CatalogManager` in trino only has one `createCatalog`, which will save > > the > > > catalog to memory and `CatalogStore`. > > > > > > I think we don't need to add another `registerCatalog` or `addCatalog` > to > > > save a catalog in `Map<String, Catalog> catalogs` or `CatalogStore > > > catalogStore`. As you mentioned before, the `Map<String, Catalog> > > > catalogs` is a cache in `CatalogManager`. How about saving the catalog > in > > > `Map<String, Catalog> catalogs` and `CatalogStore catalogStore` > together > > > when it is registered or added in `CatalogManager`? > > > > > > Best, > > > Shammon FY > > > > > > > > > On Tue, Apr 4, 2023 at 5:22 PM Feng Jin <jinfeng1...@gmail.com> wrote: > > > > > > > Thank you for your reply. I am very sorry for the misunderstanding > > caused > > > > by my deviation from the original discussion. > > > > > > > > @Shammon > > > > > I found there is a pre-discussion [1] for this FLIP > > > > Yes, there was indeed such a discussion before. However, in > designing > > > the > > > > whole solution, I found that the logic of CatalogManager itself > doesn't > > > > need to change much. *We cannot only persist Catalog instances > > > themselves*, > > > > so exposing only registerCatalog(String catalogName, Catalog catalog) > > > might > > > > not be enough to save Catalogs, because in the end we still need to > > save > > > > the configurations corresponding to the Catalog instances. > Therefore, > > I > > > > decided to introduce the CatalogStore interface for configuration > > > > persistence. Regarding this part, I also took inspiration from > Trino's > > > > implementation[1]. > > > > > > > > > > > > @yuxia > > > > > > > > > 1: The mechanism of handling catalog with the same name looks a > > little > > > of > > > > complex to me. > > > > Thank you for the suggestion. I will provide a detailed description > and > > > > code examples for this part, and add it to the FLIP. > > > > > > > > > 2: TBH, the method name `addCatalog` still looks confused to me. > > IIUC, > > > > this method is for storing catalog to CatalogStore, how about > renaming > > it > > > > to `storeCatalog`? It's very personal opinion, you can decide to take > > it > > > or > > > > not by your self. > > > > StoreCatalog looks more intuitive to me. I don't see any problem with > > it. > > > > > > > > > 3.For CREATE CATALOG statement, which method will be called? > > > > `registerCatalog` or `addCatalog`? I'm wondering whether user can > add a > > > > catalog to store with SQL stement. > > > > For CREATE CATALOG, my original design was to add it directly to the > > > > CatalogStore, but this would disrupt the existing logic. Therefore, I > > > think > > > > we can do both: save the configuration to the CatalogStore and > > initialize > > > > the Catalog instance at the same time > > > > > > > > > 3. Is it really neccessary to provide a default implmentation for > > > > interface `CatalogStoreFactory`? > > > > I think it is necessary, otherwise we would need to introduce an > > > additional > > > > Map to store the configuration for lazy loading. > > > > > > > > > 4: About asynchronous registration for catalog. > > > > I don't think registerCatalog(String catalogName, Catalog catalog) > can > > be > > > > made into an asynchronous interface because Catalog is already an > > > instance. > > > > > > > > > It looks more like lazy initialization for catalog than > asynchronous > > > > registration, right? > > > > Indeed, my description was inaccurate. It should be lazy registration > > > > instead of asynchronous registration. I have already updated the > title > > of > > > > the FLIP. > > > > > > > > > > > > > > > > > > > > [1]. > > > > > > > > > > > > > > https://github.com/trinodb/trino/blob/master/core/trino-main/src/main/java/io/trino/connector/CatalogStore.java > > > > > > > > > > > > On Tue, Apr 4, 2023 at 4:27 PM Shammon FY <zjur...@gmail.com> wrote: > > > > > > > > > Hi Feng > > > > > > > > > > I think if there is a `registerCatalog` method in `CatalogManager`, > > it > > > > will > > > > > confuse users whether a method named `addCatalog` or `storeCatalog` > > is > > > > > added. > > > > > > > > > > And as you mentioned, the memory catalog is a `cache`, I think the > > > > concept > > > > > of `cache` should not be exposed to users. > > > > > > > > > > I found there is a pre-discussion [1] for this FLIP. Please correct > > me > > > if > > > > > I'm wrong, IIUC, the conclusion of that discussion is to use > > > > > `CatalogManager` as an interface and implement it for different > > stores > > > > such > > > > > as memory, file and external system. > > > > > > > > > > I think there is a gap between the current FLIP design and that > > > > conclusion. > > > > > What about the proposal of the discussion in thread [1] ? > > > > > > > > > > > > > > > [1] > https://lists.apache.org/thread/9bnjblgd9wvrl75lkm84oo654c4lqv70 > > > > > > > > > > > > > > > Best, > > > > > Shammon FY > > > > > > > > > > > > > > > On Tue, Apr 4, 2023 at 3:41 PM yuxia <luoyu...@alumni.sjtu.edu.cn> > > > > wrote: > > > > > > > > > > > Thanks Feng for driving this FLIP. > > > > > > I have few comments: > > > > > > 1: The mechanism of handling catalog with the same name looks a > > > little > > > > of > > > > > > complex to me. I think it'll be better to explain it in the java > > doc > > > of > > > > > > these methods and give a brief example in this FLIP. > > > > > > > > > > > > 2: TBH, the method name `addCatalog` still looks confused to me. > > > IIUC, > > > > > > this method is for storing catalog to CatalogStore, how about > > > renaming > > > > it > > > > > > to `storeCatalog`? It's very personal opinion, you can decide to > > take > > > > it > > > > > or > > > > > > not by your self. > > > > > > > > > > > > 3: For CREATE CATALOG statement, which method will be called? > > > > > > `registerCatalog` or `addCatalog`? I'm wondering whether user can > > > add a > > > > > > catalog to store with SQL stement. > > > > > > > > > > > > > > > > > > 3: Is it really neccessary to provide a default implmentation for > > > > > > interface `CatalogStoreFactory`? > > > > > > > > > > > > > > > > > > 4: About asynchronous registration for catalog. > > > > > > > > > > > > > When creating a catalog with CREATE CATALOG, the asynchronous > > > > > > registration method is used by default. > > > > > > If asynchronous registration is the default behavior, it there > any > > > way > > > > > > that user can switch to synchronous registration just like > before? > > > > > > Will both method `addCatalog` and `registerCatalog` be > asynchronous > > > > > > registration? > > > > > > > > > > > > IIUC, in asynchronous registration, it may well that CREATE > CATALOG > > > > > > executes successfully, but then the following CREATE TABLE > > statement > > > > will > > > > > > fail for the catalog fail to open. > > > > > > I think it's a break change which should be highlighted in this > > FLIP, > > > > may > > > > > > be in compatibility part. > > > > > > > > > > > > > > > > > > BTW, by saying asynchronous registration, I would like to expect > > > there > > > > > > will be an executor to open or register catalog in the > background, > > > but > > > > > from > > > > > > your previous comments, > > > > > > "the Catalog instance will be initialized if it has not been > > > > initialized > > > > > > yet. If the initialization process fails, these statements will > not > > > be > > > > > > executed successfully." > > > > > > It looks more like lazy initialization for catalog than > > asynchronous > > > > > > registration, right? > > > > > > > > > > > > > > > > > > Best regards, > > > > > > Yuxia > > > > > > > > > > > > ----- 原始邮件 ----- > > > > > > 发件人: "Feng Jin" <jinfeng1...@gmail.com> > > > > > > 收件人: "dev" <dev@flink.apache.org> > > > > > > 发送时间: 星期一, 2023年 4 月 03日 下午 3:27:45 > > > > > > 主题: Re: [DISCUSS] FLIP 295: Support persistence of Catalog > > > > configuration > > > > > > and asynchronous registration > > > > > > > > > > > > Hi everyone, Thank you all for your interest in this DISCUSS. > > > > > > > > > > > > @Shammon > > > > > > > How to handle a catalog with the same name that exists for both > > of > > > > > them? > > > > > > > > > > > > I believe this is a crucial point. Based on my current personal > > > > > > understanding, the Map<String, Catalog> catalogs will serve as a > > > cache > > > > > > for instantiated catalogs and have the highest priority. > > > > > > > > > > > > There are three methods that can affect the Map<String, Catalog> > > > > > catalogs: > > > > > > > > > > > > 1. registerCatalog(String catalogName, Catalog catalog) > > > > > > > > > > > > This method puts the catalog instance into the Map<String, > Catalog> > > > > > > catalogs. > > > > > > > > > > > > 2. unregisterCatalog(String catalogName) > > > > > > > > > > > > This method removes the catalog instance corresponding to > > catalogName > > > > > > from the Map<String, Catalog> catalogs. > > > > > > > > > > > > 3. getCatalog(String catalogName) > > > > > > > > > > > > This method first retrieves the corresponding catalog instance > from > > > > > > the Map<String, Catalog> catalogs. If the catalog does not exist, > > it > > > > > > retrieves the corresponding configuration from the CatalogStore, > > > > > > initializes it, and puts the initialized Catalog instance into > the > > > > > > Map<String, Catalog> catalogs. > > > > > > > > > > > > The following two methods only modify the configuration in the > > > > > > CatalogStore: > > > > > > > > > > > > 1. addCatalog(String catalogName, Map<String, String> properties) > > > > > > > > > > > > This method saves the properties to the catalogStore and checks > > > > > > whether there is a catalogName with the same name. > > > > > > > > > > > > 2. removeCatalog(String catalogName) > > > > > > This method removes the specified configuration of the specified > > > > > > catalogName in the catalogStore. > > > > > > > > > > > > The following are possible conflict scenarios: > > > > > > > > > > > > 1. When the corresponding catalogName already exists in the > > > > > > CatalogStore but not in the Map<String, Catalog>, the > > > > > > registerCatalog(String catalogName, Catalog catalog) method can > > > > > > succeed and be directly saved to the Map<String, Catalog> > catalogs. > > > > > > > > > > > > 2. When the corresponding catalogName already exists in both the > > > > > > CatalogStore and the Map<String, Catalog>, the > > registerCatalog(String > > > > > > catalogName, Catalog catalog) method will fail. > > > > > > > > > > > > 3. When the corresponding catalogName already exists in the > > > > > > Map<String, Catalog>, the addCatalog(String catalogName, > > Map<String, > > > > > > String> properties) method can directly save the properties to > the > > > > > > catalogStore, but the getCatalog(String catalogName) method will > > not > > > > > > use the new properties for initialization because the > corresponding > > > > > > catalog instance already exists in catalogs and will be > > prioritized. > > > > > > Therefore, using the unregisterCatalog(String catalogName) method > > to > > > > > > remove the instance corresponding to the original catalogName is > > > > > > necessary. > > > > > > > > > > > > > > > > > > > > > > > > > I think it will confuse users that `registerCatalog(String > > > > > > catalogName,Catalog catalog)` in the `Map<String, Catalog> > > catalogs` > > > > and > > > > > > `registerCatalog(String catalogName, Map<String, String> > > properties) > > > > > > > > > > > > This could potentially lead to confusion. I suggest changing the > > > > > > method name, perhaps to addCatalog(String catalogName, > Map<String, > > > > > > String> properties), as previously mentioned > > > > > > > > > > > > > > > > > > @Hang > > > > > > > add `registerCatalog(String catalogName,Catalog catalog, > > > > > > boolean lazyInit)` method > > > > > > > > > > > > Since a catalog is already an instance, adding the "lazyInit" > > > > > > parameter to the registerCatalog(String catalogName, Catalog > > catalog) > > > > > > method may not necessarily result in lazy initialization. > > > > > > > > > > > > > Do we need to think about encryption > > > > > > > > > > > > I think encryption is necessary, but perhaps the encryption logic > > > > > > should be implemented in the CatalogStore when the data is > actually > > > > > > saved. For instance, the FileCatalogStore could encrypt the data > > when > > > > > > it is saved to a file, while the MemoryCatalogStore does not > > require > > > > > > encryption. > > > > > > > > > > > > > Do we really need the `MemoryCatalogStore`? > > > > > > > > > > > > I think it is necessary to have a MemoryCatalogStore as the > default > > > > > > implementation, which saves Catalog configurations in memory. > > > > > > Otherwise, if we want to implement asynchronous loading of > Catalog, > > > we > > > > > > would need to introduce additional cache. > > > > > > > > > > > > > > > > > > @Xianxun > > > > > > > > > > > > > What if asynchronous registration failed? > > > > > > > > > > > > This is also a critical concern. When executing DDL, DQL, or DML > > > > > > statements that reference a specific Catalog, the Catalog > instance > > > > > > will be initialized if it has not been initialized yet. If the > > > > > > initialization process fails, these statements will not be > executed > > > > > > successfully. > > > > > > > > > > > > > nt. The Map<String, Catalog> catalogs and CatalogStore > > catalogstore > > > > in > > > > > > the CatalogManager all have the same catalog name, but > correponding > > > to > > > > > > different catalog instances. > > > > > > > > > > > > This issue can be resolved by referring to the previous > responses. > > > The > > > > > > key principle is that the Catalog that has been most recently > used > > > > > > will be given priority for subsequent use. > > > > > > > > > > > > On Sun, Apr 2, 2023 at 10:58 PM Xianxun Ye < > > yesorno828...@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > > Hi Feng, > > > > > > > > > > > > > > Thanks for driving this Flip, I do believe this Flip could be > > > helpful > > > > > > for users. The firm I work for also manages a lot of catalogs, > and > > > the > > > > > > submission of tasks becomes slow because of loading a number of > > > > catalogs. > > > > > > We obtain the catalogs in the user's Flink SQL through regular > > > > expression > > > > > > to avoid loading all catalogs to improve the speed of task > > > submission. > > > > > > After reading flip-295, I have some questions: > > > > > > > > > > > > > > 1. When creating a catalog with CREATE CATALOG, the > asynchronous > > > > > > registration method is used by default. What if asynchronous > > > > registration > > > > > > failed? And how to implement asynchronous registration? > > > > > > > > > > > > > > 2. I also have the same question with Shammon’s first comment. > > The > > > > > > Map<String, Catalog> catalogs and CatalogStore catalogstore in > the > > > > > > CatalogManager all have the same catalog name, but correponding > to > > > > > > different catalog instances. For example: catalog1-> jdbcCatalog > in > > > the > > > > > > catalogs, but catalog1 -> hiveCatalog in the catalogstore. Would > > this > > > > > case > > > > > > happen? Maybe we should define `catalogs` and `catalogstore` more > > > > > clearly. > > > > > > > > > > > > > > > > > > > > > Best regards, > > > > > > > > > > > > > > Xianxun > > > > > > > > > > > > > > > 2023年3月31日 11:13,Hang Ruan <ruanhang1...@gmail.com> 写道: > > > > > > > > > > > > > > > > MemoryCatalogStore > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >