Hi Feng

I think if there is a `registerCatalog` method in `CatalogManager`, it will
confuse users whether a method named `addCatalog` or `storeCatalog` is
added.

And as you mentioned, the memory catalog is a `cache`, I think the concept
of `cache` should not be exposed to users.

I found there is a pre-discussion [1] for this FLIP. Please correct me if
I'm wrong, IIUC, the conclusion of that discussion is to use
`CatalogManager` as an interface and implement it for different stores such
as memory, file and external system.

I think there is a gap between the current FLIP design and that conclusion.
What about the proposal of the discussion in thread [1] ?


[1] https://lists.apache.org/thread/9bnjblgd9wvrl75lkm84oo654c4lqv70


Best,
Shammon FY


On Tue, Apr 4, 2023 at 3:41 PM yuxia <luoyu...@alumni.sjtu.edu.cn> wrote:

> Thanks Feng for driving this FLIP.
> I have few comments:
> 1: The mechanism of handling catalog with the same name looks a little of
> complex to me. I think it'll be better to explain it in the java doc of
> these methods and give a brief example in this FLIP.
>
> 2: TBH, the method name `addCatalog` still looks confused to me. IIUC,
> this method is for storing catalog to CatalogStore, how about renaming it
> to `storeCatalog`? It's very personal opinion, you can decide to take it or
> not by your self.
>
> 3: For CREATE CATALOG statement, which method will be called?
> `registerCatalog` or `addCatalog`? I'm wondering whether user can add a
> catalog to store with SQL stement.
>
>
> 3: Is it really neccessary to provide a default implmentation for
> interface `CatalogStoreFactory`?
>
>
> 4: About asynchronous registration for catalog.
>
> > When creating a catalog with CREATE CATALOG, the asynchronous
> registration method is used by default.
> If asynchronous registration is the default behavior, it there any way
> that user can switch to synchronous registration just like before?
> Will both method `addCatalog` and `registerCatalog` be asynchronous
> registration?
>
> IIUC, in asynchronous registration, it may well that CREATE CATALOG
> executes successfully, but then the following CREATE TABLE statement will
> fail for the catalog fail to open.
> I think it's a break change which should be highlighted in this FLIP, may
> be in compatibility part.
>
>
> BTW, by saying asynchronous registration, I would like to expect there
> will be an executor to open or register catalog in the background, but from
> your previous comments,
> "the Catalog instance will be initialized if it has not been initialized
> yet. If the initialization process fails, these statements will not be
> executed successfully."
> It looks more like lazy initialization for catalog than asynchronous
> registration, right?
>
>
> Best regards,
> Yuxia
>
> ----- 原始邮件 -----
> 发件人: "Feng Jin" <jinfeng1...@gmail.com>
> 收件人: "dev" <dev@flink.apache.org>
> 发送时间: 星期一, 2023年 4 月 03日 下午 3:27:45
> 主题: Re: [DISCUSS] FLIP 295: Support persistence of Catalog configuration
> and asynchronous registration
>
> Hi everyone, Thank you all for your interest in this DISCUSS.
>
> @Shammon
> > How to handle a catalog with the same name that exists for both of them?
>
> I believe this is a crucial point. Based on my current personal
> understanding, the Map<String, Catalog> catalogs will serve as a cache
> for instantiated catalogs and have the highest priority.
>
> There are three methods that can affect the Map<String, Catalog> catalogs:
>
> 1. registerCatalog(String catalogName, Catalog catalog)
>
> This method puts the catalog instance into the Map<String, Catalog>
> catalogs.
>
> 2. unregisterCatalog(String catalogName)
>
> This method removes the catalog instance corresponding to catalogName
> from the Map<String, Catalog> catalogs.
>
> 3. getCatalog(String catalogName)
>
> This method first retrieves the corresponding catalog instance from
> the Map<String, Catalog> catalogs. If the catalog does not exist, it
> retrieves the corresponding configuration from the CatalogStore,
> initializes it, and puts the initialized Catalog instance into the
> Map<String, Catalog> catalogs.
>
> The following two methods only modify the configuration in the
> CatalogStore:
>
> 1. addCatalog(String catalogName, Map<String, String> properties)
>
> This method saves the properties to the catalogStore and checks
> whether there is a catalogName with the same name.
>
> 2. removeCatalog(String catalogName)
> This method removes the specified configuration of the specified
> catalogName in the catalogStore.
>
> The following are possible conflict scenarios:
>
> 1. When the corresponding catalogName already exists in the
> CatalogStore but not in the Map<String, Catalog>, the
> registerCatalog(String catalogName, Catalog catalog) method can
> succeed and be directly saved to the Map<String, Catalog> catalogs.
>
> 2. When the corresponding catalogName already exists in both the
> CatalogStore and the Map<String, Catalog>, the registerCatalog(String
> catalogName, Catalog catalog) method will fail.
>
> 3. When the corresponding catalogName already exists in the
> Map<String, Catalog>, the addCatalog(String catalogName, Map<String,
> String> properties) method can directly save the properties to the
> catalogStore, but the getCatalog(String catalogName) method will not
> use the new properties for initialization because the corresponding
> catalog instance already exists in catalogs and will be prioritized.
> Therefore, using the unregisterCatalog(String catalogName) method to
> remove the instance corresponding to the original catalogName is
> necessary.
>
>
>
> > I think it will confuse users that `registerCatalog(String
> catalogName,Catalog catalog)` in the `Map<String, Catalog> catalogs` and
> `registerCatalog(String catalogName, Map<String, String> properties)
>
> This could potentially lead to confusion. I suggest changing the
> method name, perhaps to addCatalog(String catalogName, Map<String,
> String> properties), as previously mentioned
>
>
> @Hang
> > add `registerCatalog(String catalogName,Catalog catalog,
> boolean lazyInit)` method
>
> Since a catalog is already an instance, adding the "lazyInit"
> parameter to the registerCatalog(String catalogName, Catalog catalog)
> method may not necessarily result in lazy initialization.
>
> > Do we need to think about encryption
>
> I think encryption is necessary, but perhaps the encryption logic
> should be implemented in the CatalogStore when the data is actually
> saved. For instance, the FileCatalogStore could encrypt the data when
> it is saved to a file, while the MemoryCatalogStore does not require
> encryption.
>
> >  Do we really need the `MemoryCatalogStore`?
>
> I think it is necessary to have a MemoryCatalogStore as the default
> implementation, which saves Catalog configurations in memory.
> Otherwise, if we want to implement asynchronous loading of Catalog, we
> would need to introduce additional cache.
>
>
> @Xianxun
>
> > What if asynchronous registration failed?
>
> This is also a critical concern.  When executing DDL, DQL, or DML
> statements that reference a specific Catalog, the Catalog instance
> will be initialized if it has not been initialized yet. If the
> initialization process fails, these statements will not be executed
> successfully.
>
> > nt. The Map<String, Catalog> catalogs and CatalogStore catalogstore in
> the CatalogManager all have the same catalog name, but correponding to
> different catalog instances.
>
> This issue can be resolved by referring to the previous responses. The
> key principle is that the Catalog that has been most recently used
> will be given priority for subsequent use.
>
> On Sun, Apr 2, 2023 at 10:58 PM Xianxun Ye <yesorno828...@gmail.com>
> wrote:
> >
> > Hi Feng,
> >
> > Thanks for driving this Flip, I do believe this Flip could be helpful
> for users. The firm I work for also manages a lot of catalogs, and the
> submission of tasks becomes slow because of loading a number of catalogs.
> We obtain the catalogs in the user's Flink SQL through regular expression
> to avoid loading all catalogs to improve the speed of task submission.
> After reading flip-295, I have some questions:
> >
> > 1. When creating a catalog with CREATE CATALOG, the asynchronous
> registration method is used by default. What if asynchronous registration
> failed? And how to implement asynchronous registration?
> >
> > 2. I also have the same question with Shammon’s first comment. The
> Map<String, Catalog> catalogs and CatalogStore catalogstore in the
> CatalogManager all have the same catalog name, but correponding to
> different catalog instances. For example: catalog1-> jdbcCatalog in the
> catalogs, but catalog1 -> hiveCatalog in the catalogstore. Would this case
> happen? Maybe we should define `catalogs` and `catalogstore` more clearly.
> >
> >
> > Best regards,
> >
> > Xianxun
> >
> > > 2023年3月31日 11:13,Hang Ruan <ruanhang1...@gmail.com> 写道:
> > >
> > > MemoryCatalogStore
> >
> >
>

Reply via email to