Hi all, Iād like to discuss a feature which supports the Flink OLAP scenario.
For OLAP scenarios, There are usually some analytical queries which running time is relatively short. These queries are also sensitive to latency. In the current Blink sql processing, parse/validate/optimize stages are all need meta data from the catalog API. But each request to the catalog requires re-run of the underlying meta query. We may need a cached catalog which can cache the table schema and statistic info to avoid unnecessary repeated meta requests. The most straightforward scenario is to use Flink Batch SQL to query Hive data. If there is a Cached Hive Catalog, we will save lots of interaction latency with HMS. I have draft a design doc about this: https://docs.google.com/document/d/1oL8HUpv2WaF6OkFvbH5iefXkOJB__Dal_bYsIZJA_Gk/edit?usp=sharing Jira issue: https://issues.apache.org/jira/browse/FLINK-20416 IMO, this feature can further improve the stability and execution speed of analyze query for Flink SQL. Looking forward to your feedback, and any discussion or comments are welcome. -- *With kind regards ------------------------------------------------------------ Sebastian Liu åę“ Institute of Computing Technology, Chinese Academy of Science Mobile\WeChat: +86ā15201613655 E-mail: liuyang0...@gmail.com <liuyang0...@gmail.com> QQ: 3239559*