On Thu, Nov 15, 2012 at 7:05 PM, Jeff Janes <jeff.ja...@gmail.com> wrote: > On Wed, Nov 14, 2012 at 11:49 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: >> Jeff Janes <jeff.ja...@gmail.com> writes: >>> On Thu, Nov 8, 2012 at 9:50 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: >>>> There are at least three ways we could whack that mole: ... >>>> >>>> * Keep a separate list (or data structure of your choice) so that >>>> relcache entries created in the current xact could be found directly >>>> rather than having to scan the whole relcache. That'd add complexity >>>> though, and could perhaps be a net loss for cases where the relcache >>>> isn't so bloated. >> >>> Maybe a static list that can overflow, like the ResourceOwner/Lock >>> table one recently added. The overhead of that should be very low. >> >>> Are the three places where "need_eoxact_work = true;" the only places >>> where things need to be added to the new structure? >> >> Yeah. The problem is not so much the number of places that do that, >> as that places that flush entries from the relcache would need to know >> to remove them from the separate list, else you'd have dangling >> pointers. > > If the list is of hash-tags rather than pointers, all we would have to > do is ignore entries that are not still in the hash table, right? >
I've attached a proof-of-concept patch to implement this. I got rid of need_eoxact_work entirely and replaced it with a short list that fulfills the functions of indicating that work is needed, and suggesting which rels might need that work. There is no attempt to prevent duplicates, nor to remove invalidated entries from the list. Invalid entries are skipped when the hash entry is not found, and processing is idempotent so duplicates are not a problem. Formally speaking, if MAX_EOXACT_LIST were 0, so that the list overflowed the first time it was accessed, then it would be identical to the current behavior or having only a flag. So formally all I did was increase the max from 0 to 10. I wasn't so sure about the idempotent nature of Sub transaction processing, so I chickened out and left that part alone. I know of no workflow for which that was a bottleneck. AtEOXact_release is oddly indented because that makes the patch smaller and easier to read. This makes the non "-1" restore of large dumps very much faster (and makes them faster than "-1" restores, as well) I added a "create temp table foo (x integer) on commit drop;" line to the default pgbench transaction and tested that. I was hoping to see a performance improvement there was well (the transaction has ~110 entries in the RelationIdCache at eoxact each time), but the performance was too variable (probably due to the intense IO it causes) to detect any changes. At least it is not noticeably slower. If I hack pgbench to bloat the RelationIdCache by touching 20,000 useless tables as part of the connection start up process, then this patch does show a win. It is not obvious what value to set the MAX list size to. Since this array is only allocated once per back-end, and since it not groveled through to invalidate relations at each invalidation, there is no particular reason it must be small. But if the same table is assigned new filenodes (or forced index lists, whatever those are) repeatedly within a transaction, the list could become bloated with replicate entries, potentially becoming even larger than the hash table whose scan it is intended to short-cut. In any event, 10 seems to be large enough to overcome the currently known bottle-neck. Maybe 100 would be a more principled number, as that is about where the list could start to become as big as the basal size of the RelationIdCache table. I don't think this patch replaces having some mechanism for restricting how large RelationIdCache can get or how LRU entries in it can get as Robert suggested. But this approach seems like it is easier to implement and agree upon; and doesn't preclude doing other optimizations later. Cheers, Jeff
relcache_list_v1.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers