On 4/2/24 10:23, Andy Fan wrote: > >> On Wed, Mar 02, 2022 at 11:38:21AM -0600, Justin Pryzby wrote: >>> Rebased over 269b532ae and muted compiler warnings. > > Thank you Justin for the rebase! > > Hello Tomas, > > Thanks for the patch! Before I review the path at the code level, I want > to explain my understanding about this patch first. >
If you want to work on this patch, that'd be cool. A review would be great, but if you want to maybe take over and try moving it forward, that'd be even better. I don't know when I'll have time to work on it again, but I'd promise to help you with working on it. > Before this patch, we already use MCV information for the eqjoinsel, it > works as combine the MCV on the both sides to figure out the mcv_freq > and then treat the rest equally, but this doesn't work for MCV in > extended statistics, this patch fill this gap. Besides that, since > extended statistics means more than 1 columns are involved, if 1+ > columns are Const based on RestrictInfo, we can use such information to > filter the MCVs we are interesting, that's really cool. > Yes, I think that's an accurate description of what the patch does. > I did some more testing, all of them are inner join so far, all of them > works amazing and I am suprised this patch didn't draw enough > attention. I will test more after I go though the code. > I think it didn't go forward for a bunch of reasons: 1) I got distracted by something else requiring immediate attention, and forgot about this patch. 2) I got stuck on some detail of the patch, unsure which of the possible solutions to try first. 3) Uncertainty about how applicable the patch is in practice. I suppose it was some combination of these reasons, not sure. As for the "practicality" mentioned in (3), it's been a while since I worked on the patch so I don't recall the details, but I think I've been thinking mostly about "start join" queries, where a big "fact" table joins to small dimensions. And in that case the fact table may have a MCV, but the dimensions certainly don't have any (because the join happens on a PK). But maybe that's a wrong way to think about it - it was clearly useful to consider the case with (per-attribute) MCVs on both sides as worth special handling. So why not to do that for multi-column MCVs, right? > At for the code level, I reviewed them in the top-down manner and almost > 40% completed. Here are some findings just FYI. For efficiency purpose, > I provide each feedback with a individual commit, after all I want to > make sure my comment is practical and coding and testing is a good way > to archive that. I tried to make each of them as small as possible so > that you can reject or accept them convinently. > > 0001 is your patch, I just rebase them against the current master. 0006 > is not much relevant with current patch, and I think it can be committed > individually if you are OK with that. > > Hope this kind of review is helpful. > Cool! There's obviously no chance to get this into v18, and I have stuff to do in this CF. But I'll take a look after that. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company