[ https://issues.apache.org/jira/browse/FLINK-20416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17245865#comment-17245865 ]
Sebastian Liu commented on FLINK-20416: --------------------------------------- Hi [~jark], appreciate for your reply and suggestion. I totally agree that we should make a consensus before actual coding. The pull request in this ticket is an attempt that we have made in tpc_ds benchmark, and we hope to share it with our community after improved some query performance. Firstly, let me answer some of your questions briefly: 1. If this is a framework cache? We hope that it's not a framework cache, but a special common catalog just like the "GenericInMemoryCatalog". The difference is that this "GenericCachedCatalog" should delegate the requests for other kinds of catalogs and cache/update the results gracefully. It is up to the user to decide whether his catalog implementation needs delegate to "GenericCachedCatalog". 2. How to enable it? By job configuration? To answer this question, I think we should confirm the cache usage scenario. Long running streaming job in per job mode cluster is not suitable. Flink Sql Gateway + Session mode cluster for batch sql job is suitable. So we can enable it in related catalog properties, and CatalogFactory can check the properties to decide whether to create the original catalog or create a "GenericCachedCatalog" and put his own implementation in a delegate. Strictly, this should be a cluster configuration. 3. caching in specific catalog? Yes, I agree too. And we have add this "GenericCachedCatalog" for hive catalog in PR. And won't affect other catalogs if they do not use this. In general, this cache implementation is similar to the relevant implementation in Presto, and our goal is also to improve the OLAP performance of Flink Batch SQL > Need a cached catalog for batch SQL job > --------------------------------------- > > Key: FLINK-20416 > URL: https://issues.apache.org/jira/browse/FLINK-20416 > Project: Flink > Issue Type: Improvement > Components: Connectors / Common, Connectors / Hive, Table SQL / API, > Table SQL / Planner > Reporter: Sebastian Liu > Priority: Major > Labels: pull-request-available > > For OLAP scenarios, There are usually some analytical queries which running > time is relatively short. These queries are also sensitive to latency. In the > current Blink sql processing, parse/validate/optimize stages are all need > meta data from catalog API. But each request to the catalog requires re-run > of the underlying meta query. > > We may need a cached catalog which can cache the table schema and statistic > info to avoid unnecessary repeated meta requests. > I have submitted a related PR for adding a genetic cached catalog, which can > delegate other implementations of {{AbstractCatalog. }} > {{[https://github.com/apache/flink/pull/14260]}} -- This message was sent by Atlassian Jira (v8.3.4#803005)