Thanks Justin.

The 4ms different in the examples isn't an accurate benchmark. I'm seeing
about a ~20% difference over a larger sample size. And this is on a fork of
the production database.

Apart from the end-performance, I'm motivated to figure out why one index
results in an index scan whereas the other one does not.

I didn't mention this in my original email but I've separately tested
dropping the `state` index, running VACUUM FULL on the table, then
recreating both indexes. The result was the same where querying on state
produced an index scan whereas closed produced a bitmap scan.

Andrew's email and Michael's follow-up has me curious because it suggests
I'm running into a issue specific to indexing on IS NULL, @Justin what do
you think of this?

In the meantime Justin I'll investigate some more of your suggestions.

On Tue, Feb 19, 2019 at 9:37 PM Justin Pryzby <pry...@telsasoft.com> wrote:

> On Tue, Feb 19, 2019 at 05:10:43PM -0700, Abi Noda wrote:
> > I have a table as defined below. The table contains 1,027,616 rows,
> 50,349
> > of which have state='open' and closed IS NULL. Since closed IS NULL for
> all
> > rows where state='open', I want to remove the unnecessary state column.
> >
> > CREATE TABLE tickets (
> >   id bigserial primary key,
> >   state character varying,
> >   closed timestamp,
> ...
> > );
> >
> > CREATE INDEX  "state_index" ON "tickets" ("state") WHERE ((state)::text =
> > 'open'::text));
> >
> > As part of the process of removing the state column, I am trying to index
> > the closed column so I can achieve equal query performance (index scan)
> as
> > when I query on the state column as shown below:
> >
> > EXPLAIN ANALYZE select title, created, closed, updated from tickets
> where state = 'open';
> > Index Scan using state_index on tickets  (cost=0.29..23430.20 rows=50349
> width=64) (actual time=17.221..52.110 rows=51533 loops=1)
> >
> > However, when I index the closed column, a bitmap scan is used instead of
> > an index scan, with slightly slower performance. Why isn't an index scan
> > being used, given that the exact same number of rows are at play as in my
> > query on the state column? How do I index closed in a way where an index
> > scan is used?
> >
> > CREATE INDEX closed_index ON tickets (id) WHERE closed IS NULL;
> > EXPLAIN ANALYZE select title, created, closed, updated from tickets
> where closed IS NULL;
> > Bitmap Heap Scan on tickets  (cost=824.62..33955.85 rows=50349 width=64)
> (actual time=10.420..56.095 rows=51537 loops=1)
> >   ->  Bitmap Index Scan on closed_index  (cost=0.00..812.03 rows=50349
> width=0) (actual time=6.005..6.005 rows=51537 loops=1)
>
> Are you really concerned about 4ms ?  If this is a toy-sized test system,
> please try on something resembling production, perhaps by loading
> production or
> fake data, or perhaps on a production system within a transactions (begin;
> CREATE
> INDEX CONCURRENTLY; explain ...; rollback).
>
> You can see that most of the estimated cost is from the table (the index
> scan
> accounts for only 812 of total 33955 cost units).  So I'm guessing the
> planner
> thinks that an index scan will either 1) access the table randomly;
> and/or, 2)
> access a large fraction of the table.
>
> If it was just built, the first (partial/conditional/predicate/where) index
> will scan table in its "physical" order (if not sequentially).
>
> The 2nd index is going to scan table in order of ID, which I'm guessing is
> not
> "correlated" with its physical order, so an index scan cost is computed as
> accessing a larger fraction of the table (but by using an "bitmap" scan
> it's at
> least in physical order).  In fact: 50349/17478 = ~3 tuples/page is low, so
> you're accessing a large fraction of the table to return a small fraction
> of
> its tuples.
>
> You can check what it thinks here:
>
> https://wiki.postgresql.org/wiki/Slow_Query_Questions#Statistics:_n_distinct.2C_MCV.2C_histogram
>
> You could try CLUSTERing the table on ID (which requires a non-partial
> index)
> and ANALYZEing (which might cause this and other queries to be planned
> and/or
> perform differently).  That causes the table to be locked exclusively.
> Then,
> the planner knows that scanning index and returning results ordered by IDs
> (which doesn't matter) will also access table in physical order (which
> matters), and maybe fewer pages need to be read, too.
>
> Justin
>

Reply via email to