[HACKERS] TODO item: Allow data to be pulled directly from indexes

2008-06-29 Thread Karl Schnaitter
Sometime last year, a discussion started about including visibility 
metadata to avoid heap fetches during an index scan:


http://archives.postgresql.org/pgsql-patches/2007-10/msg00166.php
http://archives.postgresql.org/pgsql-patches/2008-01/msg00049.php

I think the last discussion on this was in April:

http://archives.postgresql.org/pgsql-hackers/2008-04/msg00618.php (last 
item)


I have worked with the current patch, and I have some thoughts about 
that approach and the approaches listed in the TODO item. The TODO lists 
three approaches, in short


(1) Add a bit for an index tuple that indicates "visible" or "maybe 
visible."
(2) Use a per-table bitmap that indicates which pages have at least one 
tuple that is not visible to all transactions.

(3) Same as (2) but at the granularity of one bit per table.

The approach in the patch is different:

(4) Add transaction ids, etc to the index tuple (totaling 16 bytes)

I would group (1) & (4) together and (2) & (3) together. I think the 
time and space trade-offs are pretty obvious, so I won't waste time on 
those.


(1) & (4) require an UPDATE or DELETE to twiddle the old index tuple. 
Tom has noted (in the linked message) that this is not reliable if the 
index has any expression-valued columns, because it is not always 
possible to find the old index entry. For this reason, the proposed 
patch does not keep visibility metadata for indexes on expressions. This 
seems like a reasonable limitation --- indexed expressions are just less 
efficient.


The main difference between (1) & (4) is that (1) will sometimes require 
heap lookups and (4) never will. Moreover, the heap lookups in (1) will 
be difficult for the optimizer to estimate, unless some special 
statistics can be maintained for this purpose.


I should mention there is a major flaw in the patch, because it puts 
pointers to HOT tuples in the index, in order to capture the different 
transaction ids in the chain. I think this can be fixed by only pointing 
to the root of the HOT chain, and setting xmin/xmax to the entire range 
of transaction ids spanned by the chain. I'm not sure about all the 
details (the ctid and some other bits also need to be set).


(2) & (3) can work for any index, and they are quite elegant in the way 
that the overhead does not change with the number of indexes. The TODO 
also notes the benefit of (2) for efficient vacuuming. Thus, I think 
that (2) is a great idea in general, but it does not serve the intended 
purpose of this TODO item. Once a page gets marked as requiring 
visibility checks, it cannot be unmarked until the next VACUUM. The 
whole point of this feature is that we are willing to be more proactive 
during updates in order to make index access more efficient.


So in summary, I think that (2) would be nice as a separate feature, 
with (1) and (4) being more favorable for index-only scans. The obvious 
trouble with (4) is the extra space overhead. There are also issues with 
correctness that I mentioned (any thoughts here would be appreciated). 
Other than that, I would favor (4) because it offers the most stable 
performance.


Please let me know if you agree/disagree with anything here. I need to 
get this feature implemented for my research, but I would also love to 
contribute it to the community so your opinions matter a lot.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TODO item: Allow data to be pulled directly from indexes

2008-06-29 Thread Gregory Stark
"Karl Schnaitter" <[EMAIL PROTECTED]> writes:

"Karl Schnaitter" <[EMAIL PROTECTED]> writes:

> (1) & (4) require an UPDATE or DELETE to twiddle the old index tuple. Tom has
> noted (in the linked message) that this is not reliable if the index has any
> expression-valued columns, because it is not always possible to find the old
> index entry. For this reason, the proposed patch does not keep visibility
> metadata for indexes on expressions. This seems like a reasonable limitation
> --- indexed expressions are just less efficient.

Or if the index operators and btproc aren't nearly as immutable as they claim.
Probably less likely than non-immutable index expressions but also possible.

> I should mention there is a major flaw in the patch, because it puts pointers
> to HOT tuples in the index, in order to capture the different transaction ids
> in the chain. I think this can be fixed by only pointing to the root of the 
> HOT
> chain, and setting xmin/xmax to the entire range of transaction ids spanned by
> the chain. I'm not sure about all the details (the ctid and some other bits
> also need to be set).

I think you can think of a HOT chain as a single tuple. The xmin of the head
is the xmin of the chain and the xmax of the tail is the xmax of the chain.
The xmin/xmax of the intermediate versions are only interesting for
determining *which* of the HOT versions to look at, but the index pointer
points to the whole chain.

> (2) & (3) can work for any index, and they are quite elegant in the way that
> the overhead does not change with the number of indexes. The TODO also notes
> the benefit of (2) for efficient vacuuming. Thus, I think that (2) is a great
> idea in general, but it does not serve the intended purpose of this TODO item.
> Once a page gets marked as requiring visibility checks, it cannot be unmarked
> until the next VACUUM. The whole point of this feature is that we are willing
> to be more proactive during updates in order to make index access more
> efficient.

Well I think that's precisely the point. If you're trading off work done at
update time against work done for index accesses then you're only going to win
if the tuples are relatively static and have lots of accesses done against
them between updates. In which case having the optimization only kick in when
the page has been static for long enough that all the tuples are globally
visible should be good enough.

The case where index visibility info might win over a visibility map might be
if the tuples are being heavily updated by long-lived transactions. In which
case they never sit globally visible for very long but having the xmin/xmax in
the index might avoid having to do a heap access for tuples which haven't been
committed yet.

As you seem to realize there has been a lot of discussion in this area
already. The visibility map looks like a much more popular direction.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] WIP patch: reducing overhead for repeat de-TOASTing

2008-06-29 Thread Tom Lane
Attached is a worked-out patch for the approach proposed here:
http://archives.postgresql.org/pgsql-hackers/2008-06/msg00777.php
namely, that cache management for de-TOASTed datums is handled
by TupleTableSlots.

To avoid premature detoasting of values that we might never need, the
patch introduces a concept of an "indirect TOAST pointer", which has
the same 0x80 or 0x01 header as an external TOAST pointer, but can
be told apart by having a different length byte.  Within that we have
* pointer to original toasted field within the Slot's tuple
* pointer to the owning Slot
* pointer to decompressed copy, or NULL if not decompressed yet
Some fairly straightforward extensions to the TupleTableSlot code,
heaptuple.c, and tuptoaster.c make it all go.

My original thoughts had included turning FREE_IF_COPY() into a no-op,
but on investigation that seems impractical.  One case that still
depends on that pfree is where we have palloc'd a 4-byte-header copy
of a short-header datum to support code that needs properly aligned
datum content.  The solution adopted in the patch is to arrange for
pfree() applied to a cacheable detoasted object to be a no-op, whereas
it still works normally for non-cached detoasted objects.  We do this
by inserting a dummy chunk header that points to a dummy memory context
whose pfree support method does nothing.  I think this part of the patch
would be required for any toast caching method, not just this one.

What I like about this patch is that it's a fairly small-footprint
change, it doesn't add much overhead, and it covers caching of
decompression for in-line-compressed datums as well as the out-of-line
case.

One thing I really *don't* like about it is that it requires everyplace
that copies Datums to know about indirect pointers: in general, the copy
must be a copy of the original toasted Datum, not of the indirect
pointer, else we have indirect pointers that can outlive their owning
TupleTableSlot (or at least outlive its current tuple cycle).  There
only seem to be about half a dozen such places in the current code,
but still it seems a rather fragile coding rule.

After playing with it for a little bit, I'm not convinced that it buys
enough performance win to be worth applying --- the restriction of cache
lifespan to one tuple cycle of a TupleTableSlot is awfully restrictive.
(For example, sorts that involve toasted sort keys continue to suck,
because the tuples being sorted aren't in Slots.)  It would probably
fix the specific case that the PostGIS hackers were complaining of,
but I think we need something more.

Still, I wanted to get it into the archives because the idea of indirect
toast pointers might be useful for something else.

regards, tom lane



bin3ziOy60Bud.bin
Description: toast-indirect-1.patch.gz

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TODO item: Allow data to be pulled directly from indexes

2008-06-29 Thread Karl Schnaitter

Gregory Stark wrote:

(1) & (4) require an UPDATE or DELETE to twiddle the old index tuple. Tom has
noted (in the linked message) that this is not reliable if the index has any
expression-valued columns, because it is not always possible to find the old
index entry. For this reason, the proposed patch does not keep visibility
metadata for indexes on expressions. This seems like a reasonable limitation
--- indexed expressions are just less efficient.



Or if the index operators and btproc aren't nearly as immutable as they claim.
Probably less likely than non-immutable index expressions but also possible.

  

Your point is well taken... I'll have to look into that more.


(2) & (3) can work for any index, and they are quite elegant in the way that
the overhead does not change with the number of indexes. The TODO also notes
the benefit of (2) for efficient vacuuming. Thus, I think that (2) is a great
idea in general, but it does not serve the intended purpose of this TODO item.
Once a page gets marked as requiring visibility checks, it cannot be unmarked
until the next VACUUM. The whole point of this feature is that we are willing
to be more proactive during updates in order to make index access more
efficient.



Well I think that's precisely the point. If you're trading off work done at
update time against work done for index accesses then you're only going to win
if the tuples are relatively static and have lots of accesses done against
them between updates. In which case having the optimization only kick in when
the page has been static for long enough that all the tuples are globally
visible should be good enough
I really don't understand this point. The way I see the visibility map 
working is as follows: we set a page to "requires visibility check" when 
a tuple on the page is inserted, deleted, or non-HOT updated. If the 
only modifications have been inserts, we can reset the status to "all 
tuples visible" when these tuples become universally visible, which 
matches your description. But in the presence of deletes and updates, we 
can only reset the status of a page after a VACUUM (I know that dead HOT 
tuples can be pruned without VACUUM, but I don't think that's the case 
for indexed tuples). We can't reset the status earlier because we don't 
know what indexes still have pointers to the dead tuples. So a page can 
be static indefinitely (after a single modification) without ever 
getting to enjoy the optimization.


This is a really important point, so please let me know if I'm missing 
something.


Thanks for your response!
Karl

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Does anything dump per-database config settings? (was Re: [HACKERS] ALTER DATABASE vs pg_dump)

2008-06-29 Thread Robert Treat
On Friday 27 June 2008 12:58:41 Richard Huxton wrote:
> Richard Huxton wrote:
> > Richard Huxton wrote:
> >> At present it means you can't reliably do:
> >>  DROP DATABASE foo;
> >>  pg_restore --create foo.dump
> >> I'd then have to either hand edit the dumpall dump or wade through a
> >> bunch of errors checking that none of them were relevant.
> >
> > Actually, I'm not sure pg_dumpall does them either.
>
> [snip]
>
> > Am I doing something stupid here?
>
> OK - so to get the ALTER DATABASE commands I need to dump the schema for
> the entire cluster. Is that really desired behaviour?
>

Certainly not desired by a number of people I have talked to, but I don't have 
much hope in seeing the behavoir change... perhaps someday if we get around 
to merging pg_dump and pg_dumpall

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Does anything dump per-database config settings? (was Re: [HACKERS] ALTER DATABASE vs pg_dump)

2008-06-29 Thread Alvaro Herrera
Robert Treat wrote:
> On Friday 27 June 2008 12:58:41 Richard Huxton wrote:

> > > Am I doing something stupid here?
> >
> > OK - so to get the ALTER DATABASE commands I need to dump the schema for
> > the entire cluster. Is that really desired behaviour?
> 
> Certainly not desired by a number of people I have talked to, but I don't 
> have 
> much hope in seeing the behavoir change... perhaps someday if we get around 
> to merging pg_dump and pg_dumpall

I have never heard anyone say the current behavior is something they desired.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Does anything dump per-database config settings? (was Re: [HACKERS] ALTER DATABASE vs pg_dump)

2008-06-29 Thread Tom Lane
Alvaro Herrera <[EMAIL PROTECTED]> writes:
> Robert Treat wrote:
>> Certainly not desired by a number of people I have talked to, but I don't 
>> have 
>> much hope in seeing the behavoir change... perhaps someday if we get around 
>> to merging pg_dump and pg_dumpall

> I have never heard anyone say the current behavior is something they desired.

So put forward a worked-out proposal for some other behavior.

My first thought is that the -c and -C options create a lot of the
issues in this area.  -c in particular is evidently meant for merging a
dump into a database that already contains unrelated objects.  (In fact
you could argue that the *default* behavior is meant for this, -c just
changes the result for conflicts.)  It seems unlikely that having
pg_dump issue ALTER DATABASE SET commands is a good idea in all of these
scenarios.

I'm also wondering why it'd be bright to treat ALTER ... SET properties
different from, say, database owner and encoding properties.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] A new take on the foot-gun meme

2008-06-29 Thread Tom Lane
[ after recovering from choking... ]

Tom "spot" Callaway presents a vivid new image in this line:

> What you're doing is analogous to using a loaded shotgun as a golf club,
> and what you're suggesting is that we take the safety off, because it
> interferes with your golf game.

https://www.redhat.com/archives/fedora-devel-list/2008-June/msg01501.html

So is that a "golf club gun"?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] VirtualXactLockTableInsert

2008-06-29 Thread Simon Riggs

On Fri, 2008-06-27 at 18:00 -0400, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > In cases where we know we will assign a real xid, can we just skip the
> > assignment of the virtual xid completely?
> 
> Even if we could do this I doubt it would be a good idea.  It'd destroy
> the invariant that all transactions have a vxid, which at the very least
> would create naming problems.

Ahh, no, I meant go straight to assigning a real xid, to avoid the
wasted effort in inserting a vxid *and* a real xid.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] VirtualXactLockTableInsert

2008-06-29 Thread Simon Riggs

On Fri, 2008-06-27 at 17:44 +0200, Florian G. Pflug wrote:
> Simon Riggs wrote:
> > When we move from having a virtual xid to having a real xid I don't
> > see any attempt to re-arrange the lock queues. Surely if there are
> > people waiting on the virtual xid, they must be moved across to wait
> > on the actual xid? Otherwise the locking queue will not be respected
> > because we have two things on which people might queue. Anybody
> > explain that?
> 
> Locks on real xids serve a different purpose than locks on virtual xids.
> Locks on real xids are used to wait for transaction who touched a
> certain tuple (in which case they certainly must have acquired a real
> xid) to end. Locks on vxids on the other hand are used to wait for the
> ending of transactions which either hold a certain lock or use a
> snapshot with a xmin earlier than some point in time.
> 
> indexcmds.c is the only place where VirtualXactLockTableWait() is used -
> the concurrent index creation needs to wait for all transactions to end
> which either might not know about the index (after phase 1 and 2), or
> who might still see tuples not included in the index (before marking the
> index valid).

Thanks,

So there is no attempt to migrate the vxid lock queue onto the xid lock
queue because it doesn't matter now/yet. That seems fragile, but as long
as we know about it we're OK.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Join Removal/ Vertical Partitioning

2008-06-29 Thread Simon Riggs

On Fri, 2008-06-27 at 17:50 -0400, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > On Thu, 2008-06-26 at 13:42 -0400, Tom Lane wrote:
> >> It might be possible to treat "ignore the RHS" as a join strategy and
> >> try to apply it while forming join relations, which would be late enough
> >> to have all the needed info available.
> 
> > Oh, actually have a join node that is a no-op, with a path cost of zero?
> 
> Not even that: just return the best path(s) for the LHS as the paths for
> the joinrel.

Much neater. Cool.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers