Bitmap heap scan performance

2019-08-09 Thread Rob Emery
Hi Guys,

I’m at a bit of a loss where I can go with the following 2 queries
that are over the same data structure (DDL attached) under postgresql
PostgreSQL 9.5.16 on x86_64-pc-linux-gnu (Debian 9.5.16-1.pgdg90+1),
compiled by gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit and
could do with a second set of eyes if someone would oblige.

I’ve attached Query1.txt and Query2.txt along with the DDL for the
tables and indicies and execution plans.

On our production environment we’re running at about 2 seconds (with
the cache warm); I’m getting a comparable speed on my playbox. It
seems to me like the Bitmap Heap Scan on proposal is the issue because
the recheck is throwing away enormous amounts of data. The
has_been_anonymised flag on the proposal is effectively a soft-delete;
so I’ve tried adding something like :

CREATE INDEX ON proposal.proposal (system_id, legacy_organisation_id, reference)
WHERE has_been_anonymised = false;

Which I was hoping would shrink the size of the index significantly
and encourage an index scan rather than bitmap, however it didn’t have
that effect. For reference:

Has_been_anonymised false:1534790
Has_been_anonymised true:7072192

Row counts over the whole table in question are :
Proposal.proposal:86069822340 MB
Proposal.note:2624423  1638 MB

Presumably I could partition proposal on has_been_anonymised, however
the row counts seem low enough that it feels a bit like overkill? We
also need referential integrity so I'll need to wait until that's in
(I think it's coming in PG12?)

If I decrease the number of legacy_organisation_id’s that are being
used then the query performance gets much better, but presumably
that’s because there’s a smaller dataset.

Any thoughts or ideas?

Thanks
Rob

-- 
 


A big Get Focused ‘thank you’ 

Why you 
should partner with an Agile company 



*
*
*Phone:* 0800 021 0888   Email: contac...@codeweavers.net 

Codeweavers Ltd | Barn 4 | Dunston 
Business Village | Dunston | ST18 9AB
Registered in England and Wales No. 
04092394 | VAT registration no. 974 9705 63 



 
  
  

CREATE TABLE proposal.proposal
(
  id bigserial NOT NULL,
  reference uuid NOT NULL,
  system_id integer NOT NULL,
  legacy_id integer NOT NULL,
  organisation_id integer,
  legacy_organisation_id integer NOT NULL,
  has_been_anonymised boolean NOT NULL,
  external_source character varying,
  CONSTRAINT pk_proposal PRIMARY KEY (id),
  CONSTRAINT proposal_reference_key UNIQUE (reference),
  CONSTRAINT proposal_system_id_legacy_id_key UNIQUE (system_id, legacy_id)
)
WITH (
  OIDS=FALSE
);

CREATE INDEX proposal_has_been_anonymised_system_id_legacy_organisation__idx
  ON proposal.proposal
  USING btree
  (has_been_anonymised, system_id, legacy_organisation_id);

CREATE INDEX proposal_has_been_anonymised_system_id_legacy_organisation_idx1
  ON proposal.proposal
  USING btree
  (has_been_anonymised, system_id, legacy_organisation_id, reference);

--

CREATE TABLE proposal.note
(
  id bigserial NOT NULL,
  reference uuid NOT NULL,
  proposal_reference uuid NOT NULL,
  entry_time timestamp with time zone NOT NULL,
  legacy_originator_id integer,
  type_id integer NOT NULL,
  content text NOT NULL,
  legacy_read_by integer,
  time_read timestamp with time zone,
  "from" character varying(100),
  "to" character varying(100),
  originator_reference uuid,
  read_by_reference uuid,
  CONSTRAINT pk_note PRIMARY KEY (id),
  CONSTRAINT note_proposal_reference_fkey FOREIGN KEY (proposal_reference)
  REFERENCES proposal.proposal (reference) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT note_reference_key UNIQUE (reference)
)
WITH (
  OIDS=FALSE
);

CREATE INDEX note_entry_time_type_id_idx
  ON proposal.note
  USING btree
  (entry_time, type_id)
  WHERE legacy_read_by IS NULL;

CREATE INDEX note_proposal_reference_idx
  ON proposal.note
  USING btree
  (proposal_reference);

CREATE INDEX note_proposal_reference_type_id_entry_time_idx
  ON proposal.note
  USING btree
  (proposal_reference, type_id, entry_time)
  WHERE legacy_read_by IS NULL;

CREATE INDEX note_type_id_entry_time_idx
  ON proposal.note
  USING btree
  (type_id, entry_time)
  WHERE legacy_read_by IS NULL;EXPLAIN (ANALYZE TRUE, VERBOSE TRUE, COSTS TRUE, BUFFERS TRUE, TIMING TRUE) 
SELECT COUNT(1)
FROM proposal.proposal
INNER JOIN proposal.note 
ON proposal.note.proposal_reference = proposal.proposal.reference
WHERE proposal.proposal.system_id = '11'
AND proposal.proposal.legacy_organisation_id IN ('2', '6', '7', '8', 
'9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', 
'22', '23', '24', '25',

Re: Bitmap heap scan performance

2019-08-09 Thread Jeff Janes
On Fri, Aug 9, 2019 at 4:42 AM Rob Emery  wrote:


>
> It
> seems to me like the Bitmap Heap Scan on proposal is the issue because
> the recheck is throwing away enormous amounts of data.


Have you tried increasing work_mem?  The probable reason for the recheck is
that your bitmap overflows the allowed memory, and then switches
from storing every tid to storing just the block numbers.  As indicated by
the lossy part of "Heap Blocks: exact=3983 lossy=27989"

The
> has_been_anonymised flag on the proposal is effectively a soft-delete;
> so I’ve tried adding something like :
>
> CREATE INDEX ON proposal.proposal (system_id, legacy_organisation_id,
> reference)
> WHERE has_been_anonymised = false;
>
> Which I was hoping would shrink the size of the index significantly
>

The partial index should be smaller, but when comparing to the index with
"has_been_anonymised" as the leading column, it won't make a lot of
difference.  You only have to scan a smaller part of the larger index, and
the sizes of part of the index you have to scan in each case will be
roughly comparable.


> and encourage an index scan rather than bitmap, however it didn’t have
> that effect.


To encourage index scans over bitmap scans, you can increase
effective_cache_size.  Or to really force the issue, you can "set
enable_bitmapscan=off" but that is something you would usually do locally
for experimental purposes, not do it in production's config settings.

Cheers,

Jeff


Re: Postgres not using correct indices for views.

2019-08-09 Thread Tom Lane
[ re-adding list ]

"Thomas Rosenstein"  writes:
> On 9 Aug 2019, at 0:45, Tom Lane wrote:
>> However ... it sort of looks like the planner didn't even consider
>> the second plan shape in the "wrong" case.  If it had, then even
>> if it costed it 3X more than it did in the "right" case, the second
>> plan would still have won out by orders of magnitude.  So there's
>> something else going on.
>> 
>> Can you show the actual query and table and view definitions?

> View definition:
>   SELECT l.id,
>  l.created_at,
>  ...
>  togdpr(l.comment) AS comment,
>  ...
> FROM loans l;

Ah-hah.  I'd been thinking about permissions on the table and
view, but here's the other moving part: functions in the view.
I bet you were incautious about making this function definition
and allowed togdpr() to be marked volatile --- which it will
be by default.  That inhibits a lot of optimizations.

I'm guessing about what that function does, but if you could
safely mark it stable or even immutable, I bet this view would
behave better.

regards, tom lane