[HACKERS] TODO item: Allow data to be pulled directly from indexes
Sometime last year, a discussion started about including visibility metadata to avoid heap fetches during an index scan: http://archives.postgresql.org/pgsql-patches/2007-10/msg00166.php http://archives.postgresql.org/pgsql-patches/2008-01/msg00049.php I think the last discussion on this was in April: http://archives.postgresql.org/pgsql-hackers/2008-04/msg00618.php (last item) I have worked with the current patch, and I have some thoughts about that approach and the approaches listed in the TODO item. The TODO lists three approaches, in short (1) Add a bit for an index tuple that indicates "visible" or "maybe visible." (2) Use a per-table bitmap that indicates which pages have at least one tuple that is not visible to all transactions. (3) Same as (2) but at the granularity of one bit per table. The approach in the patch is different: (4) Add transaction ids, etc to the index tuple (totaling 16 bytes) I would group (1) & (4) together and (2) & (3) together. I think the time and space trade-offs are pretty obvious, so I won't waste time on those. (1) & (4) require an UPDATE or DELETE to twiddle the old index tuple. Tom has noted (in the linked message) that this is not reliable if the index has any expression-valued columns, because it is not always possible to find the old index entry. For this reason, the proposed patch does not keep visibility metadata for indexes on expressions. This seems like a reasonable limitation --- indexed expressions are just less efficient. The main difference between (1) & (4) is that (1) will sometimes require heap lookups and (4) never will. Moreover, the heap lookups in (1) will be difficult for the optimizer to estimate, unless some special statistics can be maintained for this purpose. I should mention there is a major flaw in the patch, because it puts pointers to HOT tuples in the index, in order to capture the different transaction ids in the chain. I think this can be fixed by only pointing to the root of the HOT chain, and setting xmin/xmax to the entire range of transaction ids spanned by the chain. I'm not sure about all the details (the ctid and some other bits also need to be set). (2) & (3) can work for any index, and they are quite elegant in the way that the overhead does not change with the number of indexes. The TODO also notes the benefit of (2) for efficient vacuuming. Thus, I think that (2) is a great idea in general, but it does not serve the intended purpose of this TODO item. Once a page gets marked as requiring visibility checks, it cannot be unmarked until the next VACUUM. The whole point of this feature is that we are willing to be more proactive during updates in order to make index access more efficient. So in summary, I think that (2) would be nice as a separate feature, with (1) and (4) being more favorable for index-only scans. The obvious trouble with (4) is the extra space overhead. There are also issues with correctness that I mentioned (any thoughts here would be appreciated). Other than that, I would favor (4) because it offers the most stable performance. Please let me know if you agree/disagree with anything here. I need to get this feature implemented for my research, but I would also love to contribute it to the community so your opinions matter a lot. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO item: Allow data to be pulled directly from indexes
"Karl Schnaitter" <[EMAIL PROTECTED]> writes: "Karl Schnaitter" <[EMAIL PROTECTED]> writes: > (1) & (4) require an UPDATE or DELETE to twiddle the old index tuple. Tom has > noted (in the linked message) that this is not reliable if the index has any > expression-valued columns, because it is not always possible to find the old > index entry. For this reason, the proposed patch does not keep visibility > metadata for indexes on expressions. This seems like a reasonable limitation > --- indexed expressions are just less efficient. Or if the index operators and btproc aren't nearly as immutable as they claim. Probably less likely than non-immutable index expressions but also possible. > I should mention there is a major flaw in the patch, because it puts pointers > to HOT tuples in the index, in order to capture the different transaction ids > in the chain. I think this can be fixed by only pointing to the root of the > HOT > chain, and setting xmin/xmax to the entire range of transaction ids spanned by > the chain. I'm not sure about all the details (the ctid and some other bits > also need to be set). I think you can think of a HOT chain as a single tuple. The xmin of the head is the xmin of the chain and the xmax of the tail is the xmax of the chain. The xmin/xmax of the intermediate versions are only interesting for determining *which* of the HOT versions to look at, but the index pointer points to the whole chain. > (2) & (3) can work for any index, and they are quite elegant in the way that > the overhead does not change with the number of indexes. The TODO also notes > the benefit of (2) for efficient vacuuming. Thus, I think that (2) is a great > idea in general, but it does not serve the intended purpose of this TODO item. > Once a page gets marked as requiring visibility checks, it cannot be unmarked > until the next VACUUM. The whole point of this feature is that we are willing > to be more proactive during updates in order to make index access more > efficient. Well I think that's precisely the point. If you're trading off work done at update time against work done for index accesses then you're only going to win if the tuples are relatively static and have lots of accesses done against them between updates. In which case having the optimization only kick in when the page has been static for long enough that all the tuples are globally visible should be good enough. The case where index visibility info might win over a visibility map might be if the tuples are being heavily updated by long-lived transactions. In which case they never sit globally visible for very long but having the xmin/xmax in the index might avoid having to do a heap access for tuples which haven't been committed yet. As you seem to realize there has been a lot of discussion in this area already. The visibility map looks like a much more popular direction. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] WIP patch: reducing overhead for repeat de-TOASTing
Attached is a worked-out patch for the approach proposed here: http://archives.postgresql.org/pgsql-hackers/2008-06/msg00777.php namely, that cache management for de-TOASTed datums is handled by TupleTableSlots. To avoid premature detoasting of values that we might never need, the patch introduces a concept of an "indirect TOAST pointer", which has the same 0x80 or 0x01 header as an external TOAST pointer, but can be told apart by having a different length byte. Within that we have * pointer to original toasted field within the Slot's tuple * pointer to the owning Slot * pointer to decompressed copy, or NULL if not decompressed yet Some fairly straightforward extensions to the TupleTableSlot code, heaptuple.c, and tuptoaster.c make it all go. My original thoughts had included turning FREE_IF_COPY() into a no-op, but on investigation that seems impractical. One case that still depends on that pfree is where we have palloc'd a 4-byte-header copy of a short-header datum to support code that needs properly aligned datum content. The solution adopted in the patch is to arrange for pfree() applied to a cacheable detoasted object to be a no-op, whereas it still works normally for non-cached detoasted objects. We do this by inserting a dummy chunk header that points to a dummy memory context whose pfree support method does nothing. I think this part of the patch would be required for any toast caching method, not just this one. What I like about this patch is that it's a fairly small-footprint change, it doesn't add much overhead, and it covers caching of decompression for in-line-compressed datums as well as the out-of-line case. One thing I really *don't* like about it is that it requires everyplace that copies Datums to know about indirect pointers: in general, the copy must be a copy of the original toasted Datum, not of the indirect pointer, else we have indirect pointers that can outlive their owning TupleTableSlot (or at least outlive its current tuple cycle). There only seem to be about half a dozen such places in the current code, but still it seems a rather fragile coding rule. After playing with it for a little bit, I'm not convinced that it buys enough performance win to be worth applying --- the restriction of cache lifespan to one tuple cycle of a TupleTableSlot is awfully restrictive. (For example, sorts that involve toasted sort keys continue to suck, because the tuples being sorted aren't in Slots.) It would probably fix the specific case that the PostGIS hackers were complaining of, but I think we need something more. Still, I wanted to get it into the archives because the idea of indirect toast pointers might be useful for something else. regards, tom lane bin3ziOy60Bud.bin Description: toast-indirect-1.patch.gz -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO item: Allow data to be pulled directly from indexes
Gregory Stark wrote: (1) & (4) require an UPDATE or DELETE to twiddle the old index tuple. Tom has noted (in the linked message) that this is not reliable if the index has any expression-valued columns, because it is not always possible to find the old index entry. For this reason, the proposed patch does not keep visibility metadata for indexes on expressions. This seems like a reasonable limitation --- indexed expressions are just less efficient. Or if the index operators and btproc aren't nearly as immutable as they claim. Probably less likely than non-immutable index expressions but also possible. Your point is well taken... I'll have to look into that more. (2) & (3) can work for any index, and they are quite elegant in the way that the overhead does not change with the number of indexes. The TODO also notes the benefit of (2) for efficient vacuuming. Thus, I think that (2) is a great idea in general, but it does not serve the intended purpose of this TODO item. Once a page gets marked as requiring visibility checks, it cannot be unmarked until the next VACUUM. The whole point of this feature is that we are willing to be more proactive during updates in order to make index access more efficient. Well I think that's precisely the point. If you're trading off work done at update time against work done for index accesses then you're only going to win if the tuples are relatively static and have lots of accesses done against them between updates. In which case having the optimization only kick in when the page has been static for long enough that all the tuples are globally visible should be good enough I really don't understand this point. The way I see the visibility map working is as follows: we set a page to "requires visibility check" when a tuple on the page is inserted, deleted, or non-HOT updated. If the only modifications have been inserts, we can reset the status to "all tuples visible" when these tuples become universally visible, which matches your description. But in the presence of deletes and updates, we can only reset the status of a page after a VACUUM (I know that dead HOT tuples can be pruned without VACUUM, but I don't think that's the case for indexed tuples). We can't reset the status earlier because we don't know what indexes still have pointers to the dead tuples. So a page can be static indefinitely (after a single modification) without ever getting to enjoy the optimization. This is a really important point, so please let me know if I'm missing something. Thanks for your response! Karl -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Does anything dump per-database config settings? (was Re: [HACKERS] ALTER DATABASE vs pg_dump)
On Friday 27 June 2008 12:58:41 Richard Huxton wrote: > Richard Huxton wrote: > > Richard Huxton wrote: > >> At present it means you can't reliably do: > >> DROP DATABASE foo; > >> pg_restore --create foo.dump > >> I'd then have to either hand edit the dumpall dump or wade through a > >> bunch of errors checking that none of them were relevant. > > > > Actually, I'm not sure pg_dumpall does them either. > > [snip] > > > Am I doing something stupid here? > > OK - so to get the ALTER DATABASE commands I need to dump the schema for > the entire cluster. Is that really desired behaviour? > Certainly not desired by a number of people I have talked to, but I don't have much hope in seeing the behavoir change... perhaps someday if we get around to merging pg_dump and pg_dumpall -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Does anything dump per-database config settings? (was Re: [HACKERS] ALTER DATABASE vs pg_dump)
Robert Treat wrote: > On Friday 27 June 2008 12:58:41 Richard Huxton wrote: > > > Am I doing something stupid here? > > > > OK - so to get the ALTER DATABASE commands I need to dump the schema for > > the entire cluster. Is that really desired behaviour? > > Certainly not desired by a number of people I have talked to, but I don't > have > much hope in seeing the behavoir change... perhaps someday if we get around > to merging pg_dump and pg_dumpall I have never heard anyone say the current behavior is something they desired. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Does anything dump per-database config settings? (was Re: [HACKERS] ALTER DATABASE vs pg_dump)
Alvaro Herrera <[EMAIL PROTECTED]> writes: > Robert Treat wrote: >> Certainly not desired by a number of people I have talked to, but I don't >> have >> much hope in seeing the behavoir change... perhaps someday if we get around >> to merging pg_dump and pg_dumpall > I have never heard anyone say the current behavior is something they desired. So put forward a worked-out proposal for some other behavior. My first thought is that the -c and -C options create a lot of the issues in this area. -c in particular is evidently meant for merging a dump into a database that already contains unrelated objects. (In fact you could argue that the *default* behavior is meant for this, -c just changes the result for conflicts.) It seems unlikely that having pg_dump issue ALTER DATABASE SET commands is a good idea in all of these scenarios. I'm also wondering why it'd be bright to treat ALTER ... SET properties different from, say, database owner and encoding properties. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] A new take on the foot-gun meme
[ after recovering from choking... ] Tom "spot" Callaway presents a vivid new image in this line: > What you're doing is analogous to using a loaded shotgun as a golf club, > and what you're suggesting is that we take the safety off, because it > interferes with your golf game. https://www.redhat.com/archives/fedora-devel-list/2008-June/msg01501.html So is that a "golf club gun"? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VirtualXactLockTableInsert
On Fri, 2008-06-27 at 18:00 -0400, Tom Lane wrote: > Simon Riggs <[EMAIL PROTECTED]> writes: > > In cases where we know we will assign a real xid, can we just skip the > > assignment of the virtual xid completely? > > Even if we could do this I doubt it would be a good idea. It'd destroy > the invariant that all transactions have a vxid, which at the very least > would create naming problems. Ahh, no, I meant go straight to assigning a real xid, to avoid the wasted effort in inserting a vxid *and* a real xid. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VirtualXactLockTableInsert
On Fri, 2008-06-27 at 17:44 +0200, Florian G. Pflug wrote: > Simon Riggs wrote: > > When we move from having a virtual xid to having a real xid I don't > > see any attempt to re-arrange the lock queues. Surely if there are > > people waiting on the virtual xid, they must be moved across to wait > > on the actual xid? Otherwise the locking queue will not be respected > > because we have two things on which people might queue. Anybody > > explain that? > > Locks on real xids serve a different purpose than locks on virtual xids. > Locks on real xids are used to wait for transaction who touched a > certain tuple (in which case they certainly must have acquired a real > xid) to end. Locks on vxids on the other hand are used to wait for the > ending of transactions which either hold a certain lock or use a > snapshot with a xmin earlier than some point in time. > > indexcmds.c is the only place where VirtualXactLockTableWait() is used - > the concurrent index creation needs to wait for all transactions to end > which either might not know about the index (after phase 1 and 2), or > who might still see tuples not included in the index (before marking the > index valid). Thanks, So there is no attempt to migrate the vxid lock queue onto the xid lock queue because it doesn't matter now/yet. That seems fragile, but as long as we know about it we're OK. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Join Removal/ Vertical Partitioning
On Fri, 2008-06-27 at 17:50 -0400, Tom Lane wrote: > Simon Riggs <[EMAIL PROTECTED]> writes: > > On Thu, 2008-06-26 at 13:42 -0400, Tom Lane wrote: > >> It might be possible to treat "ignore the RHS" as a join strategy and > >> try to apply it while forming join relations, which would be late enough > >> to have all the needed info available. > > > Oh, actually have a join node that is a no-op, with a path cost of zero? > > Not even that: just return the best path(s) for the LHS as the paths for > the joinrel. Much neater. Cool. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers