On Tue, Jan 18, 2011 at 4:53 AM, <t...@fuzzy.cz> wrote: > So the most important question is how to intercept the new/updated rows, > and where to store them. I think each backend should maintain it's own > private list of new records and forward them only in case of commit. Does > that sound reasonable?
At the risk of sounding demoralizing, nothing about this proposal sounds very promising to me, and that sounds like a particularly bad idea. What happens if the transaction updates a billion records? Or even a million records? Are you going to store all of those in backend-local memory until commit time? Or spool them in a potentially-gigantic disk file somewhere? That memory allocation - or file - could grow to be larger than the size of the entire database in the worst case. And COMMIT could take an awfully long time if it has to spool megabytes or gigabytes of data off to some other process. And what happens if there's a crash after the COMMIT but before all that data is sent? The estimates become permanently wrong? And are we doing all of this just to get a more accurate estimate of ndistinct? For the amount of effort that it will probably take to get this working at all, you could probably implement index-only scans and have enough bandwidth left over to tackle global temporary tables. And unless I'm missing something, the chances that the performance consequences will be tolerable are pretty much zero. And it would only benefit the tiny fraction of users for whom bad n_distinct estimates cause bad plans, and then the even tinier percentage of those who can't conveniently fix it by using the manual override that we already have in place - which presumably means people who have gigantic tables that are regularly rewritten with massive changes in the data distribution that affect plan choice. Is that more than the empty set? Maybe the idea here is that this wouldn't fix just ndistinct estimates but would also help with multi-column statistics. Even if that's the case, I think it's almost certainly a dead end from a performance standpoint. Some kind of manual ANALYZE process that can be invoked when needed to scan the entire table would be painful but maybe acceptable for certain people with a problem they can't fix any other way, but doing this automatically for everyone seems like a really bad idea. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers