On Fri, Feb 28, 2020 at 3:02 PM Andy Fan <zhihui.fan1...@gmail.com> wrote:
> > > On Fri, Feb 28, 2020 at 2:35 PM Richard Guo <guofengli...@gmail.com> > wrote: > >> Hi All, >> >> Currently we will not consider EXPR_SUBLINK when pulling up sublinks and >> this would cause performance issues for some queries with the form of: >> 'a > (SELECT agg(b) from ...)' as described in [1]. >> >> So here is a patch as an attempt to pull up EXPR SubLinks. The idea, >> which is based on Greenplum's implementation, is to perform the >> following transformation. >> >> For query: >> >> select * from foo where foo.a > >> (select avg(bar.a) from bar where foo.b = bar.b); >> >> we transform it to: >> >> select * from foo inner join >> (select bar.b, avg(bar.a) as avg from bar group by bar.b) sub >> on foo.b = sub.b and foo.a > sub.avg; >> > > Glad to see this. I think the hard part is this transform is not *always* > good. for example foo.a only has 1 rows, but bar has a lot of rows, if > so > the original would be the better one. > Yes exactly. TBH I'm not sure how to achieve that. Currently in the patch this transformation happens in the stage of preprocessing the jointree. We do not have enough information at this time to tell which is better between the transformed one and untransformed one. If we want to choose the better one by cost comparison, then we need to plan the query twice, one for the transformed query and one for the untransformed query. But this seems infeasible in current optimizer's architecture. Any ideas on this part? Thanks Richard