Hi Feng

Thanks for your answer.

I have no questions about the functionality of `CatalogStore`, but the
different operations of multiple `registerCatalog` or `storeCatalog` in
`CatalogManager`. The implementation in Trino is also the same, the
`CatalogManager` in trino only has one `createCatalog`, which will save the
catalog to memory and `CatalogStore`.

I think we don't need to add another `registerCatalog` or `addCatalog` to
save a catalog in `Map<String, Catalog> catalogs` or `CatalogStore
catalogStore`.  As you mentioned before, the `Map<String, Catalog>
catalogs` is a cache in `CatalogManager`. How about saving the catalog in
`Map<String, Catalog> catalogs` and `CatalogStore catalogStore` together
when it is registered or added in `CatalogManager`?

Best,
Shammon FY


On Tue, Apr 4, 2023 at 5:22 PM Feng Jin <jinfeng1...@gmail.com> wrote:

> Thank you for your reply. I am very sorry for the misunderstanding caused
> by my deviation from the original discussion.
>
> @Shammon
> > I found there is a pre-discussion [1] for this FLIP
> Yes, there was indeed such a discussion before.  However, in designing the
> whole solution, I found that the logic of CatalogManager itself doesn't
> need to change much. *We cannot only persist Catalog instances themselves*,
> so exposing only registerCatalog(String catalogName, Catalog catalog) might
> not be enough to save Catalogs, because in the end we still need to save
> the configurations corresponding to the Catalog instances.  Therefore, I
> decided to introduce the CatalogStore interface for configuration
> persistence. Regarding this part, I also took inspiration from Trino's
> implementation[1].
>
>
> @yuxia
>
> > 1: The mechanism of handling catalog with the same name looks a little of
> complex to me.
> Thank you for the suggestion. I will provide a detailed description and
> code examples for this part, and add it to the FLIP.
>
> > 2: TBH, the method name `addCatalog` still looks confused to me. IIUC,
> this method is for storing catalog to CatalogStore, how about renaming it
> to `storeCatalog`? It's very personal opinion, you can decide to take it or
> not by your self.
> StoreCatalog looks more intuitive to me. I don't see any problem with it.
>
> > 3.For CREATE CATALOG statement, which method will be called?
> `registerCatalog` or `addCatalog`? I'm wondering whether user can add a
> catalog to store with SQL stement.
> For CREATE CATALOG, my original design was to add it directly to the
> CatalogStore, but this would disrupt the existing logic. Therefore, I think
> we can do both: save the configuration to the CatalogStore and initialize
> the Catalog instance at the same time
>
> > 3. Is it really neccessary to provide a default implmentation for
> interface `CatalogStoreFactory`?
> I think it is necessary, otherwise we would need to introduce an additional
> Map to store the configuration for lazy loading.
>
> > 4: About asynchronous registration for catalog.
> I don't think registerCatalog(String catalogName, Catalog catalog) can be
> made into an asynchronous interface because Catalog is already an instance.
>
> > It looks more like lazy initialization for catalog than asynchronous
> registration, right?
> Indeed, my description was inaccurate. It should be lazy registration
> instead of asynchronous registration. I have already updated the title of
> the FLIP.
>
>
>
>
> [1].
>
> https://github.com/trinodb/trino/blob/master/core/trino-main/src/main/java/io/trino/connector/CatalogStore.java
>
>
> On Tue, Apr 4, 2023 at 4:27 PM Shammon FY <zjur...@gmail.com> wrote:
>
> > Hi Feng
> >
> > I think if there is a `registerCatalog` method in `CatalogManager`, it
> will
> > confuse users whether a method named `addCatalog` or `storeCatalog` is
> > added.
> >
> > And as you mentioned, the memory catalog is a `cache`, I think the
> concept
> > of `cache` should not be exposed to users.
> >
> > I found there is a pre-discussion [1] for this FLIP. Please correct me if
> > I'm wrong, IIUC, the conclusion of that discussion is to use
> > `CatalogManager` as an interface and implement it for different stores
> such
> > as memory, file and external system.
> >
> > I think there is a gap between the current FLIP design and that
> conclusion.
> > What about the proposal of the discussion in thread [1] ?
> >
> >
> > [1] https://lists.apache.org/thread/9bnjblgd9wvrl75lkm84oo654c4lqv70
> >
> >
> > Best,
> > Shammon FY
> >
> >
> > On Tue, Apr 4, 2023 at 3:41 PM yuxia <luoyu...@alumni.sjtu.edu.cn>
> wrote:
> >
> > > Thanks Feng for driving this FLIP.
> > > I have few comments:
> > > 1: The mechanism of handling catalog with the same name looks a little
> of
> > > complex to me. I think it'll be better to explain it in the java doc of
> > > these methods and give a brief example in this FLIP.
> > >
> > > 2: TBH, the method name `addCatalog` still looks confused to me. IIUC,
> > > this method is for storing catalog to CatalogStore, how about renaming
> it
> > > to `storeCatalog`? It's very personal opinion, you can decide to take
> it
> > or
> > > not by your self.
> > >
> > > 3: For CREATE CATALOG statement, which method will be called?
> > > `registerCatalog` or `addCatalog`? I'm wondering whether user can add a
> > > catalog to store with SQL stement.
> > >
> > >
> > > 3: Is it really neccessary to provide a default implmentation for
> > > interface `CatalogStoreFactory`?
> > >
> > >
> > > 4: About asynchronous registration for catalog.
> > >
> > > > When creating a catalog with CREATE CATALOG, the asynchronous
> > > registration method is used by default.
> > > If asynchronous registration is the default behavior, it there any way
> > > that user can switch to synchronous registration just like before?
> > > Will both method `addCatalog` and `registerCatalog` be asynchronous
> > > registration?
> > >
> > > IIUC, in asynchronous registration, it may well that CREATE CATALOG
> > > executes successfully, but then the following CREATE TABLE statement
> will
> > > fail for the catalog fail to open.
> > > I think it's a break change which should be highlighted in this FLIP,
> may
> > > be in compatibility part.
> > >
> > >
> > > BTW, by saying asynchronous registration, I would like to expect there
> > > will be an executor to open or register catalog in the background, but
> > from
> > > your previous comments,
> > > "the Catalog instance will be initialized if it has not been
> initialized
> > > yet. If the initialization process fails, these statements will not be
> > > executed successfully."
> > > It looks more like lazy initialization for catalog than asynchronous
> > > registration, right?
> > >
> > >
> > > Best regards,
> > > Yuxia
> > >
> > > ----- 原始邮件 -----
> > > 发件人: "Feng Jin" <jinfeng1...@gmail.com>
> > > 收件人: "dev" <dev@flink.apache.org>
> > > 发送时间: 星期一, 2023年 4 月 03日 下午 3:27:45
> > > 主题: Re: [DISCUSS] FLIP 295: Support persistence of Catalog
> configuration
> > > and asynchronous registration
> > >
> > > Hi everyone, Thank you all for your interest in this DISCUSS.
> > >
> > > @Shammon
> > > > How to handle a catalog with the same name that exists for both of
> > them?
> > >
> > > I believe this is a crucial point. Based on my current personal
> > > understanding, the Map<String, Catalog> catalogs will serve as a cache
> > > for instantiated catalogs and have the highest priority.
> > >
> > > There are three methods that can affect the Map<String, Catalog>
> > catalogs:
> > >
> > > 1. registerCatalog(String catalogName, Catalog catalog)
> > >
> > > This method puts the catalog instance into the Map<String, Catalog>
> > > catalogs.
> > >
> > > 2. unregisterCatalog(String catalogName)
> > >
> > > This method removes the catalog instance corresponding to catalogName
> > > from the Map<String, Catalog> catalogs.
> > >
> > > 3. getCatalog(String catalogName)
> > >
> > > This method first retrieves the corresponding catalog instance from
> > > the Map<String, Catalog> catalogs. If the catalog does not exist, it
> > > retrieves the corresponding configuration from the CatalogStore,
> > > initializes it, and puts the initialized Catalog instance into the
> > > Map<String, Catalog> catalogs.
> > >
> > > The following two methods only modify the configuration in the
> > > CatalogStore:
> > >
> > > 1. addCatalog(String catalogName, Map<String, String> properties)
> > >
> > > This method saves the properties to the catalogStore and checks
> > > whether there is a catalogName with the same name.
> > >
> > > 2. removeCatalog(String catalogName)
> > > This method removes the specified configuration of the specified
> > > catalogName in the catalogStore.
> > >
> > > The following are possible conflict scenarios:
> > >
> > > 1. When the corresponding catalogName already exists in the
> > > CatalogStore but not in the Map<String, Catalog>, the
> > > registerCatalog(String catalogName, Catalog catalog) method can
> > > succeed and be directly saved to the Map<String, Catalog> catalogs.
> > >
> > > 2. When the corresponding catalogName already exists in both the
> > > CatalogStore and the Map<String, Catalog>, the registerCatalog(String
> > > catalogName, Catalog catalog) method will fail.
> > >
> > > 3. When the corresponding catalogName already exists in the
> > > Map<String, Catalog>, the addCatalog(String catalogName, Map<String,
> > > String> properties) method can directly save the properties to the
> > > catalogStore, but the getCatalog(String catalogName) method will not
> > > use the new properties for initialization because the corresponding
> > > catalog instance already exists in catalogs and will be prioritized.
> > > Therefore, using the unregisterCatalog(String catalogName) method to
> > > remove the instance corresponding to the original catalogName is
> > > necessary.
> > >
> > >
> > >
> > > > I think it will confuse users that `registerCatalog(String
> > > catalogName,Catalog catalog)` in the `Map<String, Catalog> catalogs`
> and
> > > `registerCatalog(String catalogName, Map<String, String> properties)
> > >
> > > This could potentially lead to confusion. I suggest changing the
> > > method name, perhaps to addCatalog(String catalogName, Map<String,
> > > String> properties), as previously mentioned
> > >
> > >
> > > @Hang
> > > > add `registerCatalog(String catalogName,Catalog catalog,
> > > boolean lazyInit)` method
> > >
> > > Since a catalog is already an instance, adding the "lazyInit"
> > > parameter to the registerCatalog(String catalogName, Catalog catalog)
> > > method may not necessarily result in lazy initialization.
> > >
> > > > Do we need to think about encryption
> > >
> > > I think encryption is necessary, but perhaps the encryption logic
> > > should be implemented in the CatalogStore when the data is actually
> > > saved. For instance, the FileCatalogStore could encrypt the data when
> > > it is saved to a file, while the MemoryCatalogStore does not require
> > > encryption.
> > >
> > > >  Do we really need the `MemoryCatalogStore`?
> > >
> > > I think it is necessary to have a MemoryCatalogStore as the default
> > > implementation, which saves Catalog configurations in memory.
> > > Otherwise, if we want to implement asynchronous loading of Catalog, we
> > > would need to introduce additional cache.
> > >
> > >
> > > @Xianxun
> > >
> > > > What if asynchronous registration failed?
> > >
> > > This is also a critical concern.  When executing DDL, DQL, or DML
> > > statements that reference a specific Catalog, the Catalog instance
> > > will be initialized if it has not been initialized yet. If the
> > > initialization process fails, these statements will not be executed
> > > successfully.
> > >
> > > > nt. The Map<String, Catalog> catalogs and CatalogStore catalogstore
> in
> > > the CatalogManager all have the same catalog name, but correponding to
> > > different catalog instances.
> > >
> > > This issue can be resolved by referring to the previous responses. The
> > > key principle is that the Catalog that has been most recently used
> > > will be given priority for subsequent use.
> > >
> > > On Sun, Apr 2, 2023 at 10:58 PM Xianxun Ye <yesorno828...@gmail.com>
> > > wrote:
> > > >
> > > > Hi Feng,
> > > >
> > > > Thanks for driving this Flip, I do believe this Flip could be helpful
> > > for users. The firm I work for also manages a lot of catalogs, and the
> > > submission of tasks becomes slow because of loading a number of
> catalogs.
> > > We obtain the catalogs in the user's Flink SQL through regular
> expression
> > > to avoid loading all catalogs to improve the speed of task submission.
> > > After reading flip-295, I have some questions:
> > > >
> > > > 1. When creating a catalog with CREATE CATALOG, the asynchronous
> > > registration method is used by default. What if asynchronous
> registration
> > > failed? And how to implement asynchronous registration?
> > > >
> > > > 2. I also have the same question with Shammon’s first comment. The
> > > Map<String, Catalog> catalogs and CatalogStore catalogstore in the
> > > CatalogManager all have the same catalog name, but correponding to
> > > different catalog instances. For example: catalog1-> jdbcCatalog in the
> > > catalogs, but catalog1 -> hiveCatalog in the catalogstore. Would this
> > case
> > > happen? Maybe we should define `catalogs` and `catalogstore` more
> > clearly.
> > > >
> > > >
> > > > Best regards,
> > > >
> > > > Xianxun
> > > >
> > > > > 2023年3月31日 11:13,Hang Ruan <ruanhang1...@gmail.com> 写道:
> > > > >
> > > > > MemoryCatalogStore
> > > >
> > > >
> > >
> >
>

Reply via email to