Bump, and curious if anyone on hackers has any ideas here: of particular interest is why the (pk, created_at) index can possibly be more valuable than the (created_at, pk) variant since the former effectively implies having to scan the entire index. On Fri, Sep 7, 2018 at 12:17 PM James Coleman <jtc...@gmail.com> wrote:
> I have the following tables: > - m(pk bigserial primary key, status text): with a single row > - s(pk bigserial primary key, status text, action_at date, m_fk bigint): > * 80% of the data has action_at between the current date and 1 year ago > and status of E or C > * 20% of the data has action_at between 5 days ago and 25 days into the > future and status of P, PD, or A > > I have two partial indexes: > - s_pk_action_at on s(pk, action_at) where status in ('P', 'PD', 'A') > - s_action_at_pk on s(action_at, pk) where status in ('P', 'PD', 'A') > > With the query: > SELECT s.pk FROM s > INNER JOIN m ON m.pk = s.m_fk > WHERE > s.status IN ('A', 'PD', 'P') > AND (action_at <= '2018-09-06') > AND s.status IN ('A', 'P') > AND m.status = 'A'; > > I generally expect the index s_action_at_pk to always be preferred over > s_pk_action_at. And on stock Postgres it does in fact use that index (with > a bitmap index scan). > > We like to set random_page_cost = 2 since we use fast SSDs only. With that > change Postgres strongly prefers the index s_pk_action_at unless I both > disable the other index and turn off bitmap heap scans. > > I'm attaching the following plans: > - base_plan.txt: default costs; both indexes available > - base_plan_rpc2.txt: random_page_cost = 2; both indexes available > - inddisabled_plan_rpc2.txt: random_page_cost = 2; only s_action_at_pk > available > - inddisabled_bhsoff_plan_rpc2.txt: random_page_cost = 2; > enable_bitmapscan = false; only s_action_at_pk available > > A couple of questions: > - How is s_pk_action_at ever efficient to scan? Given that the highest > cardinality (primary key) column is first, wouldn't an index scan > effectively have to scan the entire index? > - Why does index scan on s_action_at_pk reads over 2x as many blocks as > the bitmap heap scan with the same index? > - Would you expect Postgres to generally always prefer using the > s_action_at_pk index over the s_pk_action_at index for this query? I > realize changing the random page cost is part of what's driving this, but I > still can't imagine reading the full s_pk_action_at index (assuming that's > what it is doing) could ever be more valuable. > > As a side note, the planner is very bad at understanding a query that > happens (I realize you wouldn't write this by hand, but ORMs) when you have > a where clause like: > s.status IN ('A', 'PD', 'P') AND s.status IN ('A', 'P') > the row estimates are significantly different from a where clause with > only: > s.status IN ('A', 'P') > even though semantically those are identical. > > >