Hello everyone!
Recently, we upgraded the AWS RDS instance from Postgres 12.14 to 15.4
and noticed extremely high disk consumption on the following query
execution:
select (exists (select 1 as "one" from "public"."indexed_commit" where
"public"."indexed_commit"."repo_id" in (964992,964994,964999, ...);
For some reason, the query planner starts using Seq Scan instead of the
index on the "repo_id" column when requesting under user limited with
RLS. On prod, it happens when there are more than 316 IDs in the IN part
of the query, on stage - 3. If we execute the request from Superuser,
the planner always uses the "repo_id" index.
Luckily, we can easily reproduce this on our stage database (which is
smaller). If we add a multicolumn "repo_id, tenant_id" index, the
planner uses it (Index Only Scan) with any IN params count under RLS.
Could you please clarify if this is a Postgres bug or not? Should we
include the "tenant_id" column in all our indexes to make them work
under RLS?
Postgres version / Operating system+version
PostgreSQL 15.4 on aarch64-unknown-linux-gnu, compiled by gcc (GCC)
7.3.1 20180712 (Red Hat 7.3.1-6), 64-bit
Full Table and Index Schema
\d indexed_commit
Table "public.indexed_commit"
Column | Type | Collation | Nullable |
Default
---------------+-----------------------------+-----------+----------+---------
id | bigint | | not null |
commit_hash | character varying(40) | | not null |
parent_hash | text | | |
created_ts | timestamp without time zone | | not null |
repo_id | bigint | | not null |
lines_added | bigint | | |
lines_removed | bigint | | |
tenant_id | uuid | | not null |
author_id | uuid | | not null |
Indexes:
"indexed-commit-repo-idx" btree (repo_id)
"indexed_commit_commit_hash_repo_id_key" UNIQUE CONSTRAINT, btree
(commit_hash, repo_id) REPLICA IDENTITY
"indexed_commit_repo_id_without_loc_idx" btree (repo_id) WHERE
lines_added IS NULL OR lines_removed IS NULL
Policies:
POLICY "commit_isolation_policy"
USING ((tenant_id =
(current_setting('app.current_tenant_id'::text))::uuid))
Table Metadata
SELECT relname, relpages, reltuples, relallvisible, relkind, relnatts,
relhassubclass, reloptions, pg_table_size(oid) FROM pg_class WHERE
relname='indexed_commit';
relname | relpages | reltuples | relallvisible | relkind |
relnatts | relhassubclass | reloptions | pg_table_size
----------------+----------+--------------+---------------+---------+----------+----------------+---------------------------------------------------------------------------------------------------------------------------------------------+---------------
indexed_commit | 18170522 | 7.451964e+08 | 18104744 | r |
9 | f |
{autovacuum_vacuum_scale_factor=0,autovacuum_analyze_scale_factor=0,autovacuum_vacuum_threshold=200000,autovacuum_analyze_threshold=100000}
| 148903337984
EXPLAIN (ANALYZE, BUFFERS), not just EXPLAIN
Production queries:
316 ids under RLS limited user
<https://explain.depesz.com/s/X7Iq>
392 ids under RLS limited user <https://explain.depesz.com/s/lbkX>
392 ids under Superuser <https://explain.depesz.com/s/uKSG>
History
It became slow after the upgrade to 15.4. We never had any issues before.
Hardware
AWS DB class db.t4g.large + GP3 400GB disk
Maintenance Setup
Are you running autovacuum? Yes
If so, with what settings?
autovacuum_vacuum_scale_factor=0,autovacuum_analyze_scale_factor=0,autovacuum_vacuum_threshold=200000,autovacuum_analyze_threshold=100000
SELECT * FROM pg_stat_user_tables WHERE relname='indexed_commit';
relid | schemaname | relname | seq_scan | seq_tup_read |
idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del |
n_tup_hot_upd | n_live_tup | n_dead_tup | n_mod_since_analyze |
n_ins_since_vacuum | last_vacuum | last_autovacuum |
last_analyze | last_autoanalyze | vacuum_count |
autovacuum_count | analyze_count | autoanalyze_count
-------+------------+----------------+----------+--------------+-----------+---------------+-----------+-----------+-----------+---------------+------------+------------+---------------------+--------------------+-------------+-------------------------------+--------------+-------------------------------+--------------+------------------+---------------+-------------------
24662 | public | indexed_commit | 2485 | 49215378424 |
374533865 | 4050928807 | 764089750 | 2191615 | 18500311
| 0 | 745241398 | 383 | 46018
| 45343 | | 2023-10-11 23:51:29.170378+00
| | 2023-10-11 23:50:18.922351+00 | 0
| 672 | 0 | 753
WAL Configuration
For data writing queries: have you moved the WAL to a different disk?
Changed the settings? No.
GUC Settings
What database configuration settings have you changed? We use default
settings.
What are their values?
SELECT * FROM pg_settings WHERE name IN ('effective_cache_size',
'shared_buffers', 'work_mem');
name | setting | unit | category |
short_desc | extra_desc | context | vartype | source |
min_val | max_val | enumvals | boot_val | reset_val | sourcefile |
sourceline | pending_restart
----------------------+---------+------+---------------------------------------+------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+---------+--------------------+---------+------------+----------+----------+-----------+------------+------------+-----------------
effective_cache_size | 494234 | 8kB | Query Tuning / Planner Cost
Constants | Sets the planner's assumption about the total size of the
data caches. | That is, the total size of the caches (kernel cache and
shared buffers) used for PostgreSQL data files. This is measured in disk
pages, which are normally 8 kB each. | user | integer |
configuration file | 1 | 2147483647 | | 524288 |
494234 | | | f
shared_buffers | 247117 | 8kB | Resource Usage /
Memory | Sets the number of shared memory buffers used by
the server. | | postmaster | integer | configuration file | 16 |
1073741823 | | 16384 | 247117 | | | f
work_mem | 4096 | kB | Resource Usage /
Memory | Sets the maximum memory to be used for query
workspaces. | This much memory can be used by each
internal sort operation and hash table before switching to temporary
disk files. | user
| integer | default | 64 | 2147483647 | |
4096 | 4096 | | | f
Statistics: n_distinct, MCV, histogram
Useful to check statistics leading to bad join plan. SELECT (SELECT
sum(x) FROM unnest(most_common_freqs) x) frac_MCV, tablename, attname,
inherited, null_frac, n_distinct, array_length(most_common_vals,1)
n_mcv, array_length(histogram_bounds,1) n_hist, correlation FROM
pg_stats WHERE attname='...' AND tablename='...' ORDER BY 1 DESC;
Returns 0 rows.
Kind regards,
Alexander