[ for the archives' sake ] I wrote: > I had a thought about how to make get_tabstat_entry() faster without > adding overhead: what if we just plain remove the search, and always > assume that a new entry has to be added to the tabstat array?
I spent some time looking into this idea. It doesn't really work, because there are places that will break if a transaction has more than one tabstat entry for the same relation. The one that seems most difficult to fix is that pgstat_recv_tabstat() clamps its n_live_tuples and n_dead_tuples values to be nonnegative after adding in each delta received from a backend. That is a good idea because it prevents insane results if some messages get lost --- but if a transaction's updates get randomly spread into several tabstat items, the intermediate counts might get clamped, resulting in a wrong final answer even though nothing was lost. I also added some instrumentation printouts and found that in our regression tests: * about 10% of get_tabstat_entry() calls find an existing entry for the relation OID. This seems to happen only when a relcache entry gets flushed mid-transaction, but that does happen, and not so infrequently either. * about half of the transactions use as many as 20 tabstats, and 10% use 50 or more; but it drops off fast after that. Almost no transactions use as many as 100 tabstats. It's not clear that these numbers are representative of typical database applications, but they're something to start with anyway. I also did some testing to compare the cost of get_tabstat_entry's linear search against a dynahash.c table with OID key. As I suspected, a hash table would make this code a *lot* slower for small numbers of tabstat entries: about a factor of 10 slower. You need upwards of 100 tabstats touched in a transaction before the hash table begins to pay for itself. This is largely because dynahash doesn't have any cheap way to reset a hashtable to empty, so you have to initialize and destroy the table for each transaction. By the time you've eaten that overhead, you've already expended as many cycles as the linear search takes to handle several dozen entries. I conclude that if we wanted to do something about this, the most practical solution would be the one of executing linear searches until we get to 100+ tabstat entries in a transaction, and then building a hashtable for subsequent searches. However, it's exceedingly unclear that it will ever be worth the effort or code space to do that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers