Hi hackers Discussion[1] and the relevant commit[2] improved the selectivity calculation for IN/NOT IN.
This is the current logic for NOT IN selectivity calculation and it loops over the array elements. else { s1 = s1 * s2; if (isInequality) s1disjoint += s2 - 1.0; } By calculating s2 for each array element, it calls neqsel and returns 1 - eqsel - nullfrac. If I expand the s1disjoint calculation for a NOT IN (2,5,8) clause, It eventually becomes 1 - eqsel(2) - eqsel(5) - eqsel(8) - 3*nullfrac. If nullfrac is big, s1disjoint will be less than 0 quickly when the array has more elements, and the selectivity algorithm falls back to the one prior to commit[2] which had bad estimation for NOT IN as well. It seems to me that nullfrac should be subtracted only once. Is it feasible that we have a new variable s1disjoint2 that add back nullfrac when we get back the result for each s2 and subtract it once at the end of the loop as a 2nd heuristic? We then maybe prefer s1disjoint2 over s1disjoint and then s1? Donghang Lin (ServiceNow) [1] https://www.postgresql.org/message-id/flat/CA%2Bmi_8aPEAzBgWZpNTABGM%3DcSq7mRMyPWbMsU8eGmUfH75OTLA%40mail.gmail.com [2] https://github.com/postgres/postgres/commit/66a7e6bae98592d1d98d9ef589753f0e953c5828