Hi Jing, Thanks for the driving this, LGTM.
Best, Godfrey Jingsong Li <jingsongl...@gmail.com> 于2022年7月15日周五 11:38写道: > > Thanks for starting this discussion. > > Have we considered introducing a listPartitionWithStats() in Catalog? > > Best, > Jingsong > > On Fri, Jul 15, 2022 at 10:08 AM Jark Wu <imj...@gmail.com> wrote: > > > > Hi Jing, > > > > Thanks for starting this discussion. The bulk fetch is a great improvement > > for the optimizer. > > The FLIP looks good to me. > > > > Best, > > Jark > > > > On Fri, 8 Jul 2022 at 17:36, Jing Ge <j...@ververica.com> wrote: > > > > > Hi devs, > > > > > > After having multiple discussions with Jark and Goldfrey, I'd like to > > > start > > > a discussion on the mailing list w.r.t. FLIP-247[1], which will > > > significantly improve the performance by providing the bulk fetch > > > capability for table and column statistics. > > > > > > Currently the statistics information about tables can only be fetched from > > > the catalog by each given partition iteratively. Since getting statistics > > > information from catalogs is a very heavy operation, in order to improve > > > the query performance, we’d better provide functionality to fetch the > > > statistics information of a table for all given partitions in one shot. > > > > > > Based on the manual performance test, for 2000 partitions, the cost will > > > be > > > improved from 10s to 2s. The improvement result is 500%. > > > > > > [1] > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-247%3A+Bulk+fetch+of+table+and+column+statistics+for+given+partitions > > > > > > Best regards, > > > Jing > > >