Thanks Feng for driving this FLIP. 
I have few comments:
1: The mechanism of handling catalog with the same name looks a little of 
complex to me. I think it'll be better to explain it in the java doc of these 
methods and give a brief example in this FLIP.

2: TBH, the method name `addCatalog` still looks confused to me. IIUC, this 
method is for storing catalog to CatalogStore, how about renaming it to 
`storeCatalog`? It's very personal opinion, you can decide to take it or not by 
your self.

3: For CREATE CATALOG statement, which method will be called? `registerCatalog` 
or `addCatalog`? I'm wondering whether user can add a catalog to store with SQL 
stement.


3: Is it really neccessary to provide a default implmentation for interface 
`CatalogStoreFactory`? 


4: About asynchronous registration for catalog.

> When creating a catalog with CREATE CATALOG, the asynchronous registration 
> method is used by default.
If asynchronous registration is the default behavior, it there any way that 
user can switch to synchronous registration just like before?
Will both method `addCatalog` and `registerCatalog` be asynchronous 
registration?

IIUC, in asynchronous registration, it may well that CREATE CATALOG executes 
successfully, but then the following CREATE TABLE statement will fail for the 
catalog fail to open.
I think it's a break change which should be highlighted in this FLIP, may be in 
compatibility part.


BTW, by saying asynchronous registration, I would like to expect there will be 
an executor to open or register catalog in the background, but from your 
previous comments,
"the Catalog instance will be initialized if it has not been initialized yet. 
If the initialization process fails, these statements will not be executed 
successfully."
It looks more like lazy initialization for catalog than asynchronous 
registration, right?


Best regards,
Yuxia

----- 原始邮件 -----
发件人: "Feng Jin" <jinfeng1...@gmail.com>
收件人: "dev" <dev@flink.apache.org>
发送时间: 星期一, 2023年 4 月 03日 下午 3:27:45
主题: Re: [DISCUSS] FLIP 295: Support persistence of Catalog configuration and 
asynchronous registration

Hi everyone, Thank you all for your interest in this DISCUSS.

@Shammon
> How to handle a catalog with the same name that exists for both of them?

I believe this is a crucial point. Based on my current personal
understanding, the Map<String, Catalog> catalogs will serve as a cache
for instantiated catalogs and have the highest priority.

There are three methods that can affect the Map<String, Catalog> catalogs:

1. registerCatalog(String catalogName, Catalog catalog)

This method puts the catalog instance into the Map<String, Catalog> catalogs.

2. unregisterCatalog(String catalogName)

This method removes the catalog instance corresponding to catalogName
from the Map<String, Catalog> catalogs.

3. getCatalog(String catalogName)

This method first retrieves the corresponding catalog instance from
the Map<String, Catalog> catalogs. If the catalog does not exist, it
retrieves the corresponding configuration from the CatalogStore,
initializes it, and puts the initialized Catalog instance into the
Map<String, Catalog> catalogs.

The following two methods only modify the configuration in the CatalogStore:

1. addCatalog(String catalogName, Map<String, String> properties)

This method saves the properties to the catalogStore and checks
whether there is a catalogName with the same name.

2. removeCatalog(String catalogName)
This method removes the specified configuration of the specified
catalogName in the catalogStore.

The following are possible conflict scenarios:

1. When the corresponding catalogName already exists in the
CatalogStore but not in the Map<String, Catalog>, the
registerCatalog(String catalogName, Catalog catalog) method can
succeed and be directly saved to the Map<String, Catalog> catalogs.

2. When the corresponding catalogName already exists in both the
CatalogStore and the Map<String, Catalog>, the registerCatalog(String
catalogName, Catalog catalog) method will fail.

3. When the corresponding catalogName already exists in the
Map<String, Catalog>, the addCatalog(String catalogName, Map<String,
String> properties) method can directly save the properties to the
catalogStore, but the getCatalog(String catalogName) method will not
use the new properties for initialization because the corresponding
catalog instance already exists in catalogs and will be prioritized.
Therefore, using the unregisterCatalog(String catalogName) method to
remove the instance corresponding to the original catalogName is
necessary.



> I think it will confuse users that `registerCatalog(String 
> catalogName,Catalog catalog)` in the `Map<String, Catalog> catalogs` and 
> `registerCatalog(String catalogName, Map<String, String> properties)

This could potentially lead to confusion. I suggest changing the
method name, perhaps to addCatalog(String catalogName, Map<String,
String> properties), as previously mentioned


@Hang
> add `registerCatalog(String catalogName,Catalog catalog,
boolean lazyInit)` method

Since a catalog is already an instance, adding the "lazyInit"
parameter to the registerCatalog(String catalogName, Catalog catalog)
method may not necessarily result in lazy initialization.

> Do we need to think about encryption

I think encryption is necessary, but perhaps the encryption logic
should be implemented in the CatalogStore when the data is actually
saved. For instance, the FileCatalogStore could encrypt the data when
it is saved to a file, while the MemoryCatalogStore does not require
encryption.

>  Do we really need the `MemoryCatalogStore`?

I think it is necessary to have a MemoryCatalogStore as the default
implementation, which saves Catalog configurations in memory.
Otherwise, if we want to implement asynchronous loading of Catalog, we
would need to introduce additional cache.


@Xianxun

> What if asynchronous registration failed?

This is also a critical concern.  When executing DDL, DQL, or DML
statements that reference a specific Catalog, the Catalog instance
will be initialized if it has not been initialized yet. If the
initialization process fails, these statements will not be executed
successfully.

> nt. The Map<String, Catalog> catalogs and CatalogStore catalogstore in the 
> CatalogManager all have the same catalog name, but correponding to different 
> catalog instances.

This issue can be resolved by referring to the previous responses. The
key principle is that the Catalog that has been most recently used
will be given priority for subsequent use.

On Sun, Apr 2, 2023 at 10:58 PM Xianxun Ye <yesorno828...@gmail.com> wrote:
>
> Hi Feng,
>
> Thanks for driving this Flip, I do believe this Flip could be helpful for 
> users. The firm I work for also manages a lot of catalogs, and the submission 
> of tasks becomes slow because of loading a number of catalogs. We obtain the 
> catalogs in the user's Flink SQL through regular expression to avoid loading 
> all catalogs to improve the speed of task submission. After reading flip-295, 
> I have some questions:
>
> 1. When creating a catalog with CREATE CATALOG, the asynchronous registration 
> method is used by default. What if asynchronous registration failed? And how 
> to implement asynchronous registration?
>
> 2. I also have the same question with Shammon’s first comment. The 
> Map<String, Catalog> catalogs and CatalogStore catalogstore in the 
> CatalogManager all have the same catalog name, but correponding to different 
> catalog instances. For example: catalog1-> jdbcCatalog in the catalogs, but 
> catalog1 -> hiveCatalog in the catalogstore. Would this case happen? Maybe we 
> should define `catalogs` and `catalogstore` more clearly.
>
>
> Best regards,
>
> Xianxun
>
> > 2023年3月31日 11:13,Hang Ruan <ruanhang1...@gmail.com> 写道:
> >
> > MemoryCatalogStore
>
>

Reply via email to