Thanks Justin. The 4ms different in the examples isn't an accurate benchmark. I'm seeing about a ~20% difference over a larger sample size. And this is on a fork of the production database.
Apart from the end-performance, I'm motivated to figure out why one index results in an index scan whereas the other one does not. I didn't mention this in my original email but I've separately tested dropping the `state` index, running VACUUM FULL on the table, then recreating both indexes. The result was the same where querying on state produced an index scan whereas closed produced a bitmap scan. Andrew's email and Michael's follow-up has me curious because it suggests I'm running into a issue specific to indexing on IS NULL, @Justin what do you think of this? In the meantime Justin I'll investigate some more of your suggestions. On Tue, Feb 19, 2019 at 9:37 PM Justin Pryzby <pry...@telsasoft.com> wrote: > On Tue, Feb 19, 2019 at 05:10:43PM -0700, Abi Noda wrote: > > I have a table as defined below. The table contains 1,027,616 rows, > 50,349 > > of which have state='open' and closed IS NULL. Since closed IS NULL for > all > > rows where state='open', I want to remove the unnecessary state column. > > > > CREATE TABLE tickets ( > > id bigserial primary key, > > state character varying, > > closed timestamp, > ... > > ); > > > > CREATE INDEX "state_index" ON "tickets" ("state") WHERE ((state)::text = > > 'open'::text)); > > > > As part of the process of removing the state column, I am trying to index > > the closed column so I can achieve equal query performance (index scan) > as > > when I query on the state column as shown below: > > > > EXPLAIN ANALYZE select title, created, closed, updated from tickets > where state = 'open'; > > Index Scan using state_index on tickets (cost=0.29..23430.20 rows=50349 > width=64) (actual time=17.221..52.110 rows=51533 loops=1) > > > > However, when I index the closed column, a bitmap scan is used instead of > > an index scan, with slightly slower performance. Why isn't an index scan > > being used, given that the exact same number of rows are at play as in my > > query on the state column? How do I index closed in a way where an index > > scan is used? > > > > CREATE INDEX closed_index ON tickets (id) WHERE closed IS NULL; > > EXPLAIN ANALYZE select title, created, closed, updated from tickets > where closed IS NULL; > > Bitmap Heap Scan on tickets (cost=824.62..33955.85 rows=50349 width=64) > (actual time=10.420..56.095 rows=51537 loops=1) > > -> Bitmap Index Scan on closed_index (cost=0.00..812.03 rows=50349 > width=0) (actual time=6.005..6.005 rows=51537 loops=1) > > Are you really concerned about 4ms ? If this is a toy-sized test system, > please try on something resembling production, perhaps by loading > production or > fake data, or perhaps on a production system within a transactions (begin; > CREATE > INDEX CONCURRENTLY; explain ...; rollback). > > You can see that most of the estimated cost is from the table (the index > scan > accounts for only 812 of total 33955 cost units). So I'm guessing the > planner > thinks that an index scan will either 1) access the table randomly; > and/or, 2) > access a large fraction of the table. > > If it was just built, the first (partial/conditional/predicate/where) index > will scan table in its "physical" order (if not sequentially). > > The 2nd index is going to scan table in order of ID, which I'm guessing is > not > "correlated" with its physical order, so an index scan cost is computed as > accessing a larger fraction of the table (but by using an "bitmap" scan > it's at > least in physical order). In fact: 50349/17478 = ~3 tuples/page is low, so > you're accessing a large fraction of the table to return a small fraction > of > its tuples. > > You can check what it thinks here: > > https://wiki.postgresql.org/wiki/Slow_Query_Questions#Statistics:_n_distinct.2C_MCV.2C_histogram > > You could try CLUSTERing the table on ID (which requires a non-partial > index) > and ANALYZEing (which might cause this and other queries to be planned > and/or > perform differently). That causes the table to be locked exclusively. > Then, > the planner knows that scanning index and returning results ordered by IDs > (which doesn't matter) will also access table in physical order (which > matters), and maybe fewer pages need to be read, too. > > Justin >