On Mon, Nov 11, 2013 at 4:30 PM, Jeff Janes <jeff.ja...@gmail.com> wrote:
> On Mon, Nov 11, 2013 at 3:28 PM, Jim Nasby <jna...@enova.com> wrote: > >> On 11/11/13 4:57 PM, Jeff Janes wrote: >> >> On Mon, Nov 11, 2013 at 1:57 PM, Jim Nasby <jna...@enova.com <mailto: >>> jna...@enova.com>> wrote: >>> Btree indexes have special code that kill index-tuples when the >>> table-tuple is dead-to-all, so only the first such query after the mass >>> deletion becomes vacuum-eligible should be slow, even if a vacuum is not >>> done. But if there are long running transactions that prevent the dead >>> rows from going out of scope, nothing can be done until those transactions >>> go away. >>> >> >> There is? I didn't know that, can you point me at code? >> > > > git grep "kill_prior_tuple" > > >> >> BTW, I originally had this, even after multiple queries: >> >> Buffers: shared hit=1 read=9476 >> > What were the timings like? Upon repeated execution it seems like all the buffers should be loaded and so be "hit", not "read". > Then vacuum: >> INFO: index "page_hits_raw_pkey" now contains 50343572 row versions in >> 182800 pages >> DETAIL: 3466871 index row versions were removed. >> 44728 index pages have been deleted, 35256 are currently reusable. >> >> Then... >> >> Buffers: shared hit=1 read=4 >> >> So I suspect a vacuum is actually needed... > > > Hmm. Maybe the kill method doesn't unlink the empty pages from the tree? > I verified that this is the case--the empty pages remain linked in the tree until a vacuum removes them. But walking through empty leaf pages is way faster than resolving pages full of pointers to dead-to-all tuple, so the kill code still gives a huge benefit. But of course nothing will do much good until the transaction horizon advances. Cheers, Jeff