Linux, CentOS 5.2, Postgres 8.3.4, 8.3.5. System tables and user tables listed
below have been VACUUM'd, ANALYZE'd and REINDEX'd.
Summary:
Simple update / delete queries that hit a parent table façade of a large
partitioned database are taking 15 to 20 seconds to plan (and a couple ms to
execute).
Worse, the backend will consume about 7GB of memory while planning (measured
with top as: Resident memory - shared memory ; it is released after its done).
The particular update or delete has a where clause that causes it to only
affect one table out of many, and going directly against the child table as
named will parse and plan the query in less than 1ms and consumes very little
memory. Triggers / rules are not used to modify behavior at all.
Workaround: Execute all queries against the table partitions, not the façade.
Non-trivial for anything spanning more than 1 partition.
Table information and definitions at the end.
Queries below, run locally with \timing on. The time it takes to explain them
is the same as it is to execute, there is nothing special about the actual
explain process causing a delay. I have simplified the case to minimalist
forms that demonstrate the issue.
Total tables in the system:
select count (*) from pg_tables;
count
-------
53427
Child tables of the table in question:
select count (*) from pg_tables where tablename like pp_logs%';
count
-------
6062
A simple select against one such table (a very small one). ~1 sec to plan, a
couple ms to execute. No noticeable jump in memory use while planning.
explain analyze select att from log.pp_logs WHERE s_id=23 AND date='2008-12-01'
AND p_id = 3;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=0.00..24.12 rows=3 width=1) (actual time=0.267..0.267 rows=0
loops=1)
-> Append (cost=0.00..24.12 rows=3 width=1) (actual time=0.266..0.266
rows=0 loops=1)
-> Seq Scan on p_p_logs (cost=0.00..4.65 rows=1 width=1) (actual
time=0.048..0.048 rows=0 loops=1)
Filter: ((s_id = 23) AND (date = '2008-12-01'::date) AND (p_id =
3))
-> Seq Scan on p_p_logs_023_2008_12_01 p_p_logs (cost=0.00..19.47
rows=2 width=1) (actual time=0.218..0.218 rows=0 loops=1)
Filter: ((s_id = 23) AND (date = '2008-12-01'::date) AND (p_id =
3))
Total runtime: 4.393 ms
(7 rows)
Time: 1134.866 ms
The same in an UPDATE form, 20 seconds to plan, a couple ms to execute. 7GB of
memory used while planning (then released).
The memory and time consumed does not differ for explain versus explain analyze.
explain analyze UPDATE log.p_p_logs SET att=true
WHERE s_id=23 AND date='2008-12-01' AND p_id = 3::int;
QUERY
PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Append (cost=0.00..24.12 rows=3 width=181) (actual time=0.269..0.269 rows=0
loops=1)
-> Seq Scan on p_p_logs (cost=0.00..4.65 rows=1 width=124) (actual
time=0.045..0.045 rows=0 loops=1)
Filter: ((s_id = 23) AND (date = '2008-12-01'::date) AND (p_id = 3))
-> Seq Scan on p_p_logs_023_2008_12_01 p_p_logs (cost=0.00..19.47 rows=2
width=181) (actual time=0.221..0.221 rows=0 loops=1)
Filter: ((s_id = 23) AND (date = '2008-12-01'::date) AND (p_id = 3))
Total runtime: 4.246 ms
(6 rows)
Time: 17194.092 ms
A DELETE form. Similar to the above, uses ~ 6.7GB memory.
explain DELETE from log.p_p_logs WHERE s_id=23 AND date='2008-12-01' AND
p_id=-321;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Append (cost=0.00..24.12 rows=3 width=6)
-> Seq Scan on p_p_logs (cost=0.00..4.65 rows=1 width=6)
Filter: ((s_id = 23) AND (date = '2008-12-01'::date) AND (p_id =
(-321)))
-> Seq Scan on p_p_logs_023_2008_12_01 p_p_logs (cost=0.00..19.47 rows=2
width=6)
Filter: ((s_id = 23) AND (date = '2008-12-01'::date) AND (p_id =
(-321)))
(5 rows)
Time: 16680.702 ms
We have no triggers or rules for INSERT on the parent table façade. No rules
or triggers at all in relation to any partitioned tables, in fact. For INSERT
we go directly to the child tables. It would appear that we have to do this
for DELETE and UPDATE as well, and SELECT is expensive too. However, what
seems most worrying here is how much more expensive, and HUGELY memory
consuming it is for DELETE and UPDATE than SELECT. I would expect all of these
to have the same ~1 second of time to identify the candidate tables based on
the table constraints. But it seems like this table identification process is
somewhat slow for SELECT, and extremely bad for DELETE and UPDATE. Setting
work_mem or maintenance_work_mem to 10MB does not change the ~7GB of RAM used
to plan the query (our current settings are 800MB and 400MB, respectively).
When we go directly to the partition corresponding to the query, there is
virtually no query planning time at all. For example:
explain analyze SELECT att from p_log.p_p_logs_023_2008_12_01 WHERE p_id = 3;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Seq Scan on p_p_logs_023_2008_12_01 (cost=0.00..17.34 rows=2 width=1) (actual
time=0.125..0.125 rows=0 loops=1)
Filter: (p_id = 3)
Total runtime: 0.148 ms
(3 rows)
Time: 0.861 ms
explain analyze DELETE from p_log.p_p_logs_023_2008_12_01 WHERE p_id = 3;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Seq Scan on p_p_logs_023_2008_12_01 (cost=0.00..17.34 rows=2 width=6) (actual
time=0.125..0.125 rows=0 loops=1)
Filter: (p_id = 3)
Total runtime: 0.144 ms
(3 rows)
Time: 0.454 ms
explain analyze UPDATE p_log.p_p_logs_023_2008_12_01 SET att=true
WHERE p_id = 3::int;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Seq Scan on p_p_logs_023_2008_12_01 (cost=0.00..17.34 rows=2 width=181)
(actual time=0.119..0.119 rows=0 loops=1)
Filter: (p_id = 3)
Total runtime: 0.153 ms
(3 rows)
Time: 0.645 ms
Table definition, minus columns not in the above queries (about 12 other
columns, unimportant here):
p_log.p_p_logs_023_2008_12_01
Table "p_log.p_p_logs_023_2008_12_01"
Column | Type | Modifiers
---------------------+-----------------------------+---------------
s_id | bigint |
p_id | bigint |
date | date |
att | boolean | default false
Check constraints:
"p_p_logs_023_2008_12_01_check" CHECK (s_id = 23 AND date =
'2008-12-01'::date)
Inherits: log.p_p_logs
All ~6000 of the child tables are of this form, partitioned by one day, and one
s_id.