GIN multi-column index

2018-10-03 Thread Floris Van Nee
Hi,

I've been looking at using GIN indices in some of my tables and was wondering 
something about their behavior when using them as a multi-column index. Suppose 
we have a table 't' with column 'a' as int and column 'b' as int[]. Suppose we 
also have a query like:
select * from t where a=1 and b@>'{2}'
If I define a GIN index on (a, b) (with contrib extension btree_gin), will this 
be a significant improvement over a GIN index just on b (significant meaning, a 
similar improvement to what you would expect for a btree multi-column index in 
a similar situation where the equality is tested on first column and the second 
column is an inequality)? I know in a btree index the speedup is generally 
significant as long as your left-most index column has an equality comparison, 
due to the way btree indices work. But I couldn't find information in the docs 
on how GIN indices behave in such a situation and why.

-Floris


jsonb_set() strictness considered harmful to data

2019-10-20 Thread Floris Van Nee
FWIW I've been bitten by this 'feature' more than once as well, accidentally 
erasing a column. Now I usually write js = jsonb_set(js, coalesce(new_column, 
'null'::jsonb)) to prevent erasing the whole column, and instead setting the 
value to a jsonb null value, but I also found the STRICT behavior very 
surprising at first..


-Floris



'too many range table entries' error with partitioned tables and aggregations

2019-06-21 Thread Floris Van Nee
Hi all,


Recently we hit a strange 'too many range table entries' error for some 
particular queries in our database. Some googling revealed that it's an 
uncommon error usually occurring due to strange database design [1]. However, I 
feel like the case in which it occurs this time, although uncommon, doesn't 
look that weird in itself. I've produced a simple example that reproduces on 
both PG11 and PG12 for me. It involves creating a partitioned table and then 
selecting from it from within a subquery and a lot of aggregations.


Note that this is just a simple minimal example that triggers it. We create one 
regular table and one partitioned table with 1024 partitions. A query that

a) selects from the regular table

b) lateral joins the partitioned table to look up just one single row+column, 
referencing some column from the first table and doing this inside some subplan 
(coalesce/ case when/ etc.)

c) does a lot of aggregations - in this simple case it just aggregated the same 
column multiple times, but imagine this is a table with a lot of different 
columns that we want to sum/avg etc.


In this case, the planner generates a plan with one SubPlan per aggregate (see 
below for EXPLAIN output). Each SubPlan initially has to create a range table 
entry for every partition. So if we do 10 aggregations with 1024 partitions, we 
get 10240 entries in our range table. You can imagine it's pretty easy to hit 
the 65k limit on range table entries like this. Of course, all but one get 
pruned away due to partition pruning, but this happens too late for clauses 
such as now() as in this example. For PG11, this error even occurs when pruning 
for static timestamps, but on PG12 the particular case for static timestamps is 
fixed due to the recent work on partition pruning at planning time.


I was wondering if developers here are aware of this issue. In the previous 
threads that I read about this issue, it always looked like a far-fetched 
scenario that was very difficult to hit. This one (although still not trivial?) 
seems to be potentially more common. In any case it looks inefficient to me 
that SubPlans are chosen here if they lead to so many references in the range 
table? I understand a lot of this code probably comes from a time before 
partitioning, where it's nearly impossible to hit these kind of issues, but we 
might see more of this in the future.


I'm not really looking for a particular solution to my problem here (we can 
rewrite our queries to just avoid these SubPlans for now, so that's not a 
problem). I just wanted to share my findings to see if the community indeed 
finds it strange that examples like this yield errors and if they should be 
fixed, or if users should work around them on their own by rewriting their 
queries not to use SubPlans for partitioned tables.


-- create example tables. p1 as a regular table, q1 as a partitioned table with 
1024 partitions
drop table if exists p1;
create table p1 (
   ts timestamptz,
   p int
);

drop table if exists q1;
create table q1 (
   ts timestamptz,
   q int
) partition by range(ts);

do $an$
DECLARE
   i int;
begin
   for i in select * from generate_series(1, 1024) loop
  execute format($f$ create table q1_%s partition of q1 for values from 
('%s') to ('%s') $f$, i, '2015-04-04'::date - i, '2015-04-04'::date - i + 1);
   end loop;
end;
$an$;

-- this yields the error. removing one aggregation runs fine
explain
select
   sum(p*q), sum(p*q), sum(p*q), sum(p*q), sum(p*q), sum(p*q), sum(p*q), 
sum(p*q),
   sum(p*q), sum(p*q), sum(p*q), sum(p*q), sum(p*q), sum(p*q), sum(p*q), 
sum(p*q),
   sum(p*q), sum(p*q), sum(p*q), sum(p*q), sum(p*q), sum(p*q), sum(p*q), 
sum(p*q),
   sum(p*q), sum(p*q), sum(p*q), sum(p*q), sum(p*q), sum(p*q), sum(p*q), 
sum(p*q),
   sum(p*q), sum(p*q), sum(p*q), sum(p*q), sum(p*q), sum(p*q), sum(p*q), 
sum(p*q),
   sum(p*q), sum(p*q), sum(p*q), sum(p*q), sum(p*q), sum(p*q), sum(p*q), 
sum(p*q),
   sum(p*q), sum(p*q), sum(p*q), sum(p*q), sum(p*q), sum(p*q), sum(p*q), 
sum(p*q),
   sum(p*q), sum(p*q), sum(p*q), sum(p*q), sum(p*q), sum(p*q), sum(p*q), 
sum(p*q)
from p1
cross join lateral (
   select coalesce((select q from q1 where ts = now() and ts > p1.ts limit 1), 
1) q
) _
;

Example EXPLAIN output if we just do 2 sum(p*q) aggregations instead of 64 
aggregations:

Aggregate  (cost=62223.20..62223.21 rows=1 width=16)
  ->  Seq Scan on p1  (cost=0.00..30.40 rows=2040 width=12)
  SubPlan 2
->  Limit  (cost=0.00..15.24 rows=1 width=4)
  ->  Append  (cost=0.00..46812.16 rows=3072 width=4)
Subplans Removed: 1023
->  Seq Scan on q1_1024 q1_1024_1  (cost=0.00..45.70 rows=3 
width=4)
  Filter: ((ts > p1.ts) AND (ts = now()))
  SubPlan 1
->  Limit  (cost=0.00..15.24 rows=1 width=4)
  ->  Append  (cost=0.00..46812.16 rows=3072 width=4)
Subplans Removed: 1023
->  Seq Scan on q1_1024  (cost=0.00..45.70 rows=3 width=4)