Robert Haas <robertmh...@gmail.com> writes: > ... how > important is stability to ANALYZE? If you *either* retake your MVCC > snapshots periodically as you re-scan the table *or* use a non-MVCC > snapshot for the scan, you can get those same kinds of artifacts: you > might see two copies of a just-updated row, or none. Maybe this would > actually *break* something - e.g. could there be code that would get > confused if we sample multiple rows for the same value in a column > that has a UNIQUE index? But I think mostly the consequences would be > that you might get somewhat different results from the statistics.
Yeah, that's an excellent point. I can imagine somebody complaining "this query clearly matches a unique index, why is the planner estimating multiple rows out?". But most of the time it wouldn't matter much. (And I think you can get cases like that anyway today.) > It's not clear to me that it would even be correct to categorize those > somewhat-different results as "less accurate." Estimating two rows where the correct answer is one row is clearly "less accurate". But I suspect you'd have to be quite unlucky to get such a result in practice from Simon's proposal, as long as we weren't super-aggressive about changing ANALYZE's snapshot a lot. regards, tom lane