Hi Feng

Thanks for your update.

I found there are two options in `Proposed Changes`, can you put the
unselected option in `Rejected Alternatives`? I think this may help us
better understand your proposal


Best,
Shammon FY


On Thu, Apr 13, 2023 at 4:49 AM Jing Ge <j...@ververica.com.invalid> wrote:

> Hi Feng,
>
> Thanks for raising this FLIP. I am still confused after completely reading
> the thread with following questions:
>
> 1. Naming confusion - registerCatalog() and addCatalog() have no big
> difference based on their names. One of them is responsible for data
> persistence. How about persistCatalog()?
> 2. As you mentioned that Map<String, Catalog> catalogs is used as a cache
> and catalogStore is used for data persistence. I would suggest describing
> their purpose conceptually and clearly in the FLIP. Some common cache
> features should be implemented, i.e. data in the cache and the store should
> be consistent. Same Catalog instance should be found in the store and in
> the cache(either it has been initialized or it will be lazy initialized)
> for the same catalog name. The consistency will be taken care of while
> updating the catalog.
> 3. As the above discussion moves forward, the option 2 solution looks more
> like a replacement of option 1, because, afaiu, issues mentioned
> previously with option 1 are not solved yet. Do you still want to propose
> both options and ask for suggestions for both of them?
> 4. After you updated the FLIP, there are some inconsistent descriptions in
> the content.  Would you like to clean them up? Thanks!
>
> Best regards,
> Jing
>
>
> On Fri, Apr 7, 2023 at 9:24 AM Feng Jin <jinfeng1...@gmail.com> wrote:
>
> > hi Shammon
> >
> > Thank you for your response, and I completely agree with your point of
> > view.
> > Initially, I may have over complicated the whole issue. First and
> foremost,
> > we need to consider the persistence of the Catalog's Configuration.
> > If we only need to provide persistence for Catalog Configuration, we can
> > add a toConfiguration method to the Catalog interface.
> > This method can convert a Catalog instance to a Map<String, String>
> > properties, and the default implementation will throw an exception.
> >
> > public interface Catalog {
> >    /**
> >    * Returns a map containing the properties of the catalog object.
> >    *
> >    * @return a map containing the properties of the catalog object
> >    * @throws UnsupportedOperationException if the implementing class does
> > not override
> >    * the default implementation of this method
> >    */
> >   default Map<String, String> toProperties() {
> >     throw new UnsupportedOperationException("Please implement
> toProperties
> > for this catalog");
> >   }
> > }
> >
> > The specific process is as follows:
> >
> > 1.  If the user has configured a CatalogStore, the toConfiguration()
> method
> > will be called when registering a Catalog instance with
> > registerCatalog(String catalogName, Catalog catalog). The Catalog
> instance
> > will be converted to a Map<String, String> properties and saved to the
> > CatalogStore. If some Catalog instances do not implement this method, an
> > exception will be thrown.
> >
> > 2.  If the user does not use a CatalogStore, the toConfiguration() method
> > will not be called, ensuring consistency with the original process.
> >
> > 3. Saving both the Map<String, Catalog> catalogs and the CatalogStore at
> > the same time can also avoid conflicts
> >
> >
> > For lazy initialization:
> > we can start from the Catalog itself and provide a dedicated Catalog
> > implementation for lazy loading, such as ConfigurationCatalog
> >
> > public class ConfigurationCatalog implements Catalog {
> > ConfigurationCatalog(Map<String, String> properties) {
> > }
> > }
> >
> > I have added this design to the FLIP.
> >
> >
> > Best,
> > Feng
> >
> > On Thu, Apr 6, 2023 at 10:03 AM Shammon FY <zjur...@gmail.com> wrote:
> >
> > > Hi Feng
> > >
> > > Thanks for your answer.
> > >
> > > I have no questions about the functionality of `CatalogStore`, but the
> > > different operations of multiple `registerCatalog` or `storeCatalog` in
> > > `CatalogManager`. The implementation in Trino is also the same, the
> > > `CatalogManager` in trino only has one `createCatalog`, which will save
> > the
> > > catalog to memory and `CatalogStore`.
> > >
> > > I think we don't need to add another `registerCatalog` or `addCatalog`
> to
> > > save a catalog in `Map<String, Catalog> catalogs` or `CatalogStore
> > > catalogStore`.  As you mentioned before, the `Map<String, Catalog>
> > > catalogs` is a cache in `CatalogManager`. How about saving the catalog
> in
> > > `Map<String, Catalog> catalogs` and `CatalogStore catalogStore`
> together
> > > when it is registered or added in `CatalogManager`?
> > >
> > > Best,
> > > Shammon FY
> > >
> > >
> > > On Tue, Apr 4, 2023 at 5:22 PM Feng Jin <jinfeng1...@gmail.com> wrote:
> > >
> > > > Thank you for your reply. I am very sorry for the misunderstanding
> > caused
> > > > by my deviation from the original discussion.
> > > >
> > > > @Shammon
> > > > > I found there is a pre-discussion [1] for this FLIP
> > > > Yes, there was indeed such a discussion before.  However, in
> designing
> > > the
> > > > whole solution, I found that the logic of CatalogManager itself
> doesn't
> > > > need to change much. *We cannot only persist Catalog instances
> > > themselves*,
> > > > so exposing only registerCatalog(String catalogName, Catalog catalog)
> > > might
> > > > not be enough to save Catalogs, because in the end we still need to
> > save
> > > > the configurations corresponding to the Catalog instances.
> Therefore,
> > I
> > > > decided to introduce the CatalogStore interface for configuration
> > > > persistence. Regarding this part, I also took inspiration from
> Trino's
> > > > implementation[1].
> > > >
> > > >
> > > > @yuxia
> > > >
> > > > > 1: The mechanism of handling catalog with the same name looks a
> > little
> > > of
> > > > complex to me.
> > > > Thank you for the suggestion. I will provide a detailed description
> and
> > > > code examples for this part, and add it to the FLIP.
> > > >
> > > > > 2: TBH, the method name `addCatalog` still looks confused to me.
> > IIUC,
> > > > this method is for storing catalog to CatalogStore, how about
> renaming
> > it
> > > > to `storeCatalog`? It's very personal opinion, you can decide to take
> > it
> > > or
> > > > not by your self.
> > > > StoreCatalog looks more intuitive to me. I don't see any problem with
> > it.
> > > >
> > > > > 3.For CREATE CATALOG statement, which method will be called?
> > > > `registerCatalog` or `addCatalog`? I'm wondering whether user can
> add a
> > > > catalog to store with SQL stement.
> > > > For CREATE CATALOG, my original design was to add it directly to the
> > > > CatalogStore, but this would disrupt the existing logic. Therefore, I
> > > think
> > > > we can do both: save the configuration to the CatalogStore and
> > initialize
> > > > the Catalog instance at the same time
> > > >
> > > > > 3. Is it really neccessary to provide a default implmentation for
> > > > interface `CatalogStoreFactory`?
> > > > I think it is necessary, otherwise we would need to introduce an
> > > additional
> > > > Map to store the configuration for lazy loading.
> > > >
> > > > > 4: About asynchronous registration for catalog.
> > > > I don't think registerCatalog(String catalogName, Catalog catalog)
> can
> > be
> > > > made into an asynchronous interface because Catalog is already an
> > > instance.
> > > >
> > > > > It looks more like lazy initialization for catalog than
> asynchronous
> > > > registration, right?
> > > > Indeed, my description was inaccurate. It should be lazy registration
> > > > instead of asynchronous registration. I have already updated the
> title
> > of
> > > > the FLIP.
> > > >
> > > >
> > > >
> > > >
> > > > [1].
> > > >
> > > >
> > >
> >
> https://github.com/trinodb/trino/blob/master/core/trino-main/src/main/java/io/trino/connector/CatalogStore.java
> > > >
> > > >
> > > > On Tue, Apr 4, 2023 at 4:27 PM Shammon FY <zjur...@gmail.com> wrote:
> > > >
> > > > > Hi Feng
> > > > >
> > > > > I think if there is a `registerCatalog` method in `CatalogManager`,
> > it
> > > > will
> > > > > confuse users whether a method named `addCatalog` or `storeCatalog`
> > is
> > > > > added.
> > > > >
> > > > > And as you mentioned, the memory catalog is a `cache`, I think the
> > > > concept
> > > > > of `cache` should not be exposed to users.
> > > > >
> > > > > I found there is a pre-discussion [1] for this FLIP. Please correct
> > me
> > > if
> > > > > I'm wrong, IIUC, the conclusion of that discussion is to use
> > > > > `CatalogManager` as an interface and implement it for different
> > stores
> > > > such
> > > > > as memory, file and external system.
> > > > >
> > > > > I think there is a gap between the current FLIP design and that
> > > > conclusion.
> > > > > What about the proposal of the discussion in thread [1] ?
> > > > >
> > > > >
> > > > > [1]
> https://lists.apache.org/thread/9bnjblgd9wvrl75lkm84oo654c4lqv70
> > > > >
> > > > >
> > > > > Best,
> > > > > Shammon FY
> > > > >
> > > > >
> > > > > On Tue, Apr 4, 2023 at 3:41 PM yuxia <luoyu...@alumni.sjtu.edu.cn>
> > > > wrote:
> > > > >
> > > > > > Thanks Feng for driving this FLIP.
> > > > > > I have few comments:
> > > > > > 1: The mechanism of handling catalog with the same name looks a
> > > little
> > > > of
> > > > > > complex to me. I think it'll be better to explain it in the java
> > doc
> > > of
> > > > > > these methods and give a brief example in this FLIP.
> > > > > >
> > > > > > 2: TBH, the method name `addCatalog` still looks confused to me.
> > > IIUC,
> > > > > > this method is for storing catalog to CatalogStore, how about
> > > renaming
> > > > it
> > > > > > to `storeCatalog`? It's very personal opinion, you can decide to
> > take
> > > > it
> > > > > or
> > > > > > not by your self.
> > > > > >
> > > > > > 3: For CREATE CATALOG statement, which method will be called?
> > > > > > `registerCatalog` or `addCatalog`? I'm wondering whether user can
> > > add a
> > > > > > catalog to store with SQL stement.
> > > > > >
> > > > > >
> > > > > > 3: Is it really neccessary to provide a default implmentation for
> > > > > > interface `CatalogStoreFactory`?
> > > > > >
> > > > > >
> > > > > > 4: About asynchronous registration for catalog.
> > > > > >
> > > > > > > When creating a catalog with CREATE CATALOG, the asynchronous
> > > > > > registration method is used by default.
> > > > > > If asynchronous registration is the default behavior, it there
> any
> > > way
> > > > > > that user can switch to synchronous registration just like
> before?
> > > > > > Will both method `addCatalog` and `registerCatalog` be
> asynchronous
> > > > > > registration?
> > > > > >
> > > > > > IIUC, in asynchronous registration, it may well that CREATE
> CATALOG
> > > > > > executes successfully, but then the following CREATE TABLE
> > statement
> > > > will
> > > > > > fail for the catalog fail to open.
> > > > > > I think it's a break change which should be highlighted in this
> > FLIP,
> > > > may
> > > > > > be in compatibility part.
> > > > > >
> > > > > >
> > > > > > BTW, by saying asynchronous registration, I would like to expect
> > > there
> > > > > > will be an executor to open or register catalog in the
> background,
> > > but
> > > > > from
> > > > > > your previous comments,
> > > > > > "the Catalog instance will be initialized if it has not been
> > > > initialized
> > > > > > yet. If the initialization process fails, these statements will
> not
> > > be
> > > > > > executed successfully."
> > > > > > It looks more like lazy initialization for catalog than
> > asynchronous
> > > > > > registration, right?
> > > > > >
> > > > > >
> > > > > > Best regards,
> > > > > > Yuxia
> > > > > >
> > > > > > ----- 原始邮件 -----
> > > > > > 发件人: "Feng Jin" <jinfeng1...@gmail.com>
> > > > > > 收件人: "dev" <dev@flink.apache.org>
> > > > > > 发送时间: 星期一, 2023年 4 月 03日 下午 3:27:45
> > > > > > 主题: Re: [DISCUSS] FLIP 295: Support persistence of Catalog
> > > > configuration
> > > > > > and asynchronous registration
> > > > > >
> > > > > > Hi everyone, Thank you all for your interest in this DISCUSS.
> > > > > >
> > > > > > @Shammon
> > > > > > > How to handle a catalog with the same name that exists for both
> > of
> > > > > them?
> > > > > >
> > > > > > I believe this is a crucial point. Based on my current personal
> > > > > > understanding, the Map<String, Catalog> catalogs will serve as a
> > > cache
> > > > > > for instantiated catalogs and have the highest priority.
> > > > > >
> > > > > > There are three methods that can affect the Map<String, Catalog>
> > > > > catalogs:
> > > > > >
> > > > > > 1. registerCatalog(String catalogName, Catalog catalog)
> > > > > >
> > > > > > This method puts the catalog instance into the Map<String,
> Catalog>
> > > > > > catalogs.
> > > > > >
> > > > > > 2. unregisterCatalog(String catalogName)
> > > > > >
> > > > > > This method removes the catalog instance corresponding to
> > catalogName
> > > > > > from the Map<String, Catalog> catalogs.
> > > > > >
> > > > > > 3. getCatalog(String catalogName)
> > > > > >
> > > > > > This method first retrieves the corresponding catalog instance
> from
> > > > > > the Map<String, Catalog> catalogs. If the catalog does not exist,
> > it
> > > > > > retrieves the corresponding configuration from the CatalogStore,
> > > > > > initializes it, and puts the initialized Catalog instance into
> the
> > > > > > Map<String, Catalog> catalogs.
> > > > > >
> > > > > > The following two methods only modify the configuration in the
> > > > > > CatalogStore:
> > > > > >
> > > > > > 1. addCatalog(String catalogName, Map<String, String> properties)
> > > > > >
> > > > > > This method saves the properties to the catalogStore and checks
> > > > > > whether there is a catalogName with the same name.
> > > > > >
> > > > > > 2. removeCatalog(String catalogName)
> > > > > > This method removes the specified configuration of the specified
> > > > > > catalogName in the catalogStore.
> > > > > >
> > > > > > The following are possible conflict scenarios:
> > > > > >
> > > > > > 1. When the corresponding catalogName already exists in the
> > > > > > CatalogStore but not in the Map<String, Catalog>, the
> > > > > > registerCatalog(String catalogName, Catalog catalog) method can
> > > > > > succeed and be directly saved to the Map<String, Catalog>
> catalogs.
> > > > > >
> > > > > > 2. When the corresponding catalogName already exists in both the
> > > > > > CatalogStore and the Map<String, Catalog>, the
> > registerCatalog(String
> > > > > > catalogName, Catalog catalog) method will fail.
> > > > > >
> > > > > > 3. When the corresponding catalogName already exists in the
> > > > > > Map<String, Catalog>, the addCatalog(String catalogName,
> > Map<String,
> > > > > > String> properties) method can directly save the properties to
> the
> > > > > > catalogStore, but the getCatalog(String catalogName) method will
> > not
> > > > > > use the new properties for initialization because the
> corresponding
> > > > > > catalog instance already exists in catalogs and will be
> > prioritized.
> > > > > > Therefore, using the unregisterCatalog(String catalogName) method
> > to
> > > > > > remove the instance corresponding to the original catalogName is
> > > > > > necessary.
> > > > > >
> > > > > >
> > > > > >
> > > > > > > I think it will confuse users that `registerCatalog(String
> > > > > > catalogName,Catalog catalog)` in the `Map<String, Catalog>
> > catalogs`
> > > > and
> > > > > > `registerCatalog(String catalogName, Map<String, String>
> > properties)
> > > > > >
> > > > > > This could potentially lead to confusion. I suggest changing the
> > > > > > method name, perhaps to addCatalog(String catalogName,
> Map<String,
> > > > > > String> properties), as previously mentioned
> > > > > >
> > > > > >
> > > > > > @Hang
> > > > > > > add `registerCatalog(String catalogName,Catalog catalog,
> > > > > > boolean lazyInit)` method
> > > > > >
> > > > > > Since a catalog is already an instance, adding the "lazyInit"
> > > > > > parameter to the registerCatalog(String catalogName, Catalog
> > catalog)
> > > > > > method may not necessarily result in lazy initialization.
> > > > > >
> > > > > > > Do we need to think about encryption
> > > > > >
> > > > > > I think encryption is necessary, but perhaps the encryption logic
> > > > > > should be implemented in the CatalogStore when the data is
> actually
> > > > > > saved. For instance, the FileCatalogStore could encrypt the data
> > when
> > > > > > it is saved to a file, while the MemoryCatalogStore does not
> > require
> > > > > > encryption.
> > > > > >
> > > > > > >  Do we really need the `MemoryCatalogStore`?
> > > > > >
> > > > > > I think it is necessary to have a MemoryCatalogStore as the
> default
> > > > > > implementation, which saves Catalog configurations in memory.
> > > > > > Otherwise, if we want to implement asynchronous loading of
> Catalog,
> > > we
> > > > > > would need to introduce additional cache.
> > > > > >
> > > > > >
> > > > > > @Xianxun
> > > > > >
> > > > > > > What if asynchronous registration failed?
> > > > > >
> > > > > > This is also a critical concern.  When executing DDL, DQL, or DML
> > > > > > statements that reference a specific Catalog, the Catalog
> instance
> > > > > > will be initialized if it has not been initialized yet. If the
> > > > > > initialization process fails, these statements will not be
> executed
> > > > > > successfully.
> > > > > >
> > > > > > > nt. The Map<String, Catalog> catalogs and CatalogStore
> > catalogstore
> > > > in
> > > > > > the CatalogManager all have the same catalog name, but
> correponding
> > > to
> > > > > > different catalog instances.
> > > > > >
> > > > > > This issue can be resolved by referring to the previous
> responses.
> > > The
> > > > > > key principle is that the Catalog that has been most recently
> used
> > > > > > will be given priority for subsequent use.
> > > > > >
> > > > > > On Sun, Apr 2, 2023 at 10:58 PM Xianxun Ye <
> > yesorno828...@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > Hi Feng,
> > > > > > >
> > > > > > > Thanks for driving this Flip, I do believe this Flip could be
> > > helpful
> > > > > > for users. The firm I work for also manages a lot of catalogs,
> and
> > > the
> > > > > > submission of tasks becomes slow because of loading a number of
> > > > catalogs.
> > > > > > We obtain the catalogs in the user's Flink SQL through regular
> > > > expression
> > > > > > to avoid loading all catalogs to improve the speed of task
> > > submission.
> > > > > > After reading flip-295, I have some questions:
> > > > > > >
> > > > > > > 1. When creating a catalog with CREATE CATALOG, the
> asynchronous
> > > > > > registration method is used by default. What if asynchronous
> > > > registration
> > > > > > failed? And how to implement asynchronous registration?
> > > > > > >
> > > > > > > 2. I also have the same question with Shammon’s first comment.
> > The
> > > > > > Map<String, Catalog> catalogs and CatalogStore catalogstore in
> the
> > > > > > CatalogManager all have the same catalog name, but correponding
> to
> > > > > > different catalog instances. For example: catalog1-> jdbcCatalog
> in
> > > the
> > > > > > catalogs, but catalog1 -> hiveCatalog in the catalogstore. Would
> > this
> > > > > case
> > > > > > happen? Maybe we should define `catalogs` and `catalogstore` more
> > > > > clearly.
> > > > > > >
> > > > > > >
> > > > > > > Best regards,
> > > > > > >
> > > > > > > Xianxun
> > > > > > >
> > > > > > > > 2023年3月31日 11:13,Hang Ruan <ruanhang1...@gmail.com> 写道:
> > > > > > > >
> > > > > > > > MemoryCatalogStore
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to