optimisation? collation "C" sorting for GroupAggregate for all deterministic collations
Hi All, It is known, that collation "C" significantly speeds up string comparisons and as a result sorting. I was wondering, whether it is possible to use it regardless of collation set on a column in sorts not visible to users? Example I have in mind is sorting performed for GroupAggregate. Purpose of that sort is to bring equal values next to each other, so as long as: 1) user didn't request ORDER BY in addition to GROUP BY 2) source column has any deterministic collation (as per docs all builtin collations are deterministic) it seems to be possible to do sorting with any deterministic collation, regardless of what user specifid for the column being sorted. "C" collation is deterministic and fastest. In other words, couldn't PostgreSQL convert this: -> GroupAggregate (cost=15726557.87..22944558.69 rows=721 width=176) (actual time=490103.209..771536.389 rows=3600 loops=1) Group Key: ec_180days.msn, ec_180days.to_date_time -> Sort (cost=15726557.87..15906557.89 rows=7208 width=113) (actual time=490094.849..524854.662 rows=7200 loops=1) Sort Key: ec_180days.msn, ec_180days.to_date_time Sort Method: external merge Disk: 7679136kB To this: -> GroupAggregate (cost=14988274.87..22206275.69 rows=721 width=155) (actual time=140497.729..421510.001 rows=3600 loops=1) Group Key: ec_180days.msn, ec_180days.to_date_time -> Sort (cost=14988274.87..15168274.89 rows=7208 width=92) (actual time=140489.807..174228.722 rows=7200 loops=1) Sort Key: ec_180days.msn COLLATE "C", ec_180days.to_date_time Sort Method: external merge Disk: 7679136kB which is 3 times faster in my tests.
Re: optimisation? collation "C" sorting for GroupAggregate for all deterministic collations
> Perhaps this is what you mean by "deterministic", but isn't it > possible for some collations to treat multiple byte sequences as equal > values? And those multiple byte sequences wouldn't necessarily occur > sequentially in C collation, so it wouldn't be possible to work around > that by having the grouping node use one collation but the sorting > node use the C one. > > If my memory is incorrect, then this sounds like an intriguing idea. Yes, as per doc (https://www.postgresql.org/docs/12/collation.html#COLLATION-NONDETERMINISTIC) some collations can result in symbols(chars? codes? runes?) to be equal, while their byte representations is not. This optimisation should check for source table collation and do not change sorting collation if columns being sorted use non deterministic collation. Luckily in practice it is probably to be very rare, all builtin collations are deterministic.
Use of additional index columns in rows filtering
Hi All, I'd like to report what seems to be a missing optimization opportunity or understand why it is not possible to achieve. TLDR; additional index column B specified in CREATE INDEX .. (A) INCLUDE(B) is not used to filter rows in queries like WHERE B = $1 ORDER BY A during IndexScan. https://dbfiddle.uk/iehtq44L Take following database: CREATE TABLE t( a integer NOT NULL, b integer NOT NULL, d integer NOT NULL ); CREATE UNIQUE INDEX t_a_include_b ON t (a) INCLUDE (b); -- I'd expect index above to behave as index below for the purpose -- of this query -- CREATE UNIQUE INDEX ON t(a,b); INSERT INTO t( SELECT random() * 1 as a, random() * 3 as b, generate_series as d FROM generate_series(1,20) ) ON CONFLICT DO NOTHING; If we filter on `a` and `b` columns while scanning index created as `(a) INCLUDE (b)` it seems to be fetching tuples from heap to check for condition `b = 4` despite both columns available in the index: SELECT * FROM t WHERE a > 100 and b = 4 ORDER BY a ASC LIMIT 10; Here is the plan (notice high "shared hit"): Limit (cost=0.42..10955.01 rows=1 width=12) (actual time=84.283..84.284 rows=0 loops=1) Output: a, b, d Buffers: shared hit=198307 -> Index Scan using t_a_include_b on public.t (cost=0.42..10955.01 rows=1 width=12) (actual time=84.280..84.281 rows=0 loops=1) Output: a, b, d Index Cond: (t.a > 100) Filter: (t.b = 4) Rows Removed by Filter: 197805 Buffers: shared hit=198307 Planning: Buffers: shared hit=30 Planning Time: 0.201 ms Execution Time: 84.303 ms And here is the plan with index on (a,b). Limit (cost=0.42..4447.90 rows=1 width=12) (actual time=6.883..6.884 rows=0 loops=1) Output: a, b, d Buffers: shared hit=613 -> Index Scan using t_a_b_idx on public.t (cost=0.42..4447.90 rows=1 width=12) (actual time=6.880..6.881 rows=0 loops=1) Output: a, b, d Index Cond: ((t.a > 100) AND (t.b = 4)) Buffers: shared hit=613 Planning: Buffers: shared hit=41 Planning Time: 0.314 ms Execution Time: 6.910 ms Because query doesn't sort on `b`, only filters on it while sorting on `a`, I'd expect indexes `(a) INCLUDE (b)` and `(a,b)` behave exactly the same with this particular query. Interestingly, IndexOnlyScan is capable of using additional columns to filter rows without fetching them from the heap, but only for visibible tuples: VACUUM FREEZE t; SELECT a,b FROM t WHERE a > 100 and b = 4 ORDER BY a ASC LIMIT 10; Limit (cost=0.42..6619.76 rows=1 width=8) (actual time=18.479..18.480 rows=0 loops=1) Output: a, b Buffers: shared hit=662 -> Index Only Scan using t_a_include_b on public.t (cost=0.42..6619.76 rows=1 width=8) (actual time=18.477..18.477 rows=0 loops=1) Output: a, b Index Cond: (t.a > 100) Filter: (t.b = 4) Rows Removed by Filter: 197771 Heap Fetches: 0 Buffers: shared hit=662 Removing VACUUM makes it behave like IndexScan and fetch candidate tuples from heap all while returning zero rows in the result. To make query plan comparable I had to force index scan on both with: SET enable_bitmapscan to off; SET enable_seqscan to off; SET max_parallel_workers_per_gather = 0; Self contained fully reproducible example is in https://dbfiddle.uk/iehtq44L Regards, Maxim
Re: Use of additional index columns in rows filtering
> This isn't a problem for operators found in operator families, because > we trust those to not have undesirable side effects like raising > data-dependent errors. But it'd be an issue if we started to apply > quals that aren't index quals directly to index rows before doing > the heap liveness check. (And, of course, once you've fetched the > heap row there's no point in having a special path for columns > available from the index.) Assuming operators are pure and don't have global side effects, is it possible to ignore any error during that check? If tuple is not visible it shouldn't matter, if it is visible then error will be reported by the same routine which does filtering now (ExecQual?). If not, then limiting this optimization to builtin ops is something I can live with :)