Jeff Janes <jeff.ja...@gmail.com> writes: > On Thu, Aug 18, 2016 at 2:25 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: >> It does know it, what it doesn't know is how many duplicates there are.
> Does it know whether the count comes from a parsed query-string list/array, > rather than being an estimate from something else? If it came from a join, > I can see why it would be dangerous to assume they are mostly distinct. > But if someone throws 6000 things into a query string and only 200 distinct > values among them, they have no one to blame but themselves when it makes > bad choices off of that. I am not exactly sold on this assumption that applications have de-duplicated the contents of a VALUES or IN list. They haven't been asked to do that in the past, so why do you think they are doing it? >> If we do what I think you're suggesting, which is assume the entries are >> all distinct, I'm afraid we'll just move the estimation problems somewhere >> else. > Any guesses as to where? (other than the case of someone doing something > silly with their query strings?) Well, overestimates are as bad as underestimates --- it might lead us away from using a nestloop, for example. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers