Re: [GENERAL] strange row count estimates with conditions on multiple column

2010-12-12 Thread tv
> Tomas Vondra writes: >> I've been thinking about this and I think it might be improved. If I >> understand the logic corretly, it says 'use half of the histogram bin >> size'. But the value > >> #define DEFAULT_RANGE_INEQ_SEL 0.005 > >> says it's always 0.5%, which is not not true if STATISTICS

Re: [GENERAL] strange row count estimates with conditions on multiple column

2010-12-12 Thread Tom Lane
Tomas Vondra writes: > Dne 17.11.2010 06:58, Tom Lane napsal(a): >>> BTW I think the default estimate used to be 1000, so it was changed in >>> one of the 8.x releases? Can you point me to the docs? I've even tried >>> to find that in the sources, but unsuccessfully. >> >> It's DEFAULT_RANGE_INEQ

Re: [GENERAL] strange row count estimates with conditions on multiple column

2010-12-11 Thread Tomas Vondra
Dne 17.11.2010 06:58, Tom Lane napsal(a): >> BTW I think the default estimate used to be 1000, so it was changed in >> one of the 8.x releases? Can you point me to the docs? I've even tried >> to find that in the sources, but unsuccessfully. > > It's DEFAULT_RANGE_INEQ_SEL, and AFAIR it hasn't cha

Re: [GENERAL] strange row count estimates with conditions on multiple column

2010-11-16 Thread Tom Lane
Tomas Vondra writes: > Yes, I understand why MCV is not used in case of col_b, and I do > understand that the estimate may not be precise. But I'm wondering > what's a better estimate in such cases - 1, 5000, any constant, or > something related to a the histogram? It is doing it off the histogra

Re: [GENERAL] strange row count estimates with conditions on multiple column

2010-11-16 Thread Tomas Vondra
> BTW I think the default estimate used to be 1000, so it was changed in > one of the 8.x releases? Can you point me to the docs? I've even tried > to find that in the sources, but unsuccessfully. OK, I've found it right after submitting the e-mail. It's defined in selfuncs.h as DEFAULT_RANGE_INE

Re: [GENERAL] strange row count estimates with conditions on multiple column

2010-11-16 Thread Tomas Vondra
Dne 17.11.2010 05:22, Tom Lane napsal(a): > Tomas Vondra writes: >> Estimate for condition >>... WHERE (col_a BETWEEN 33 AND 33) >> is about 10k rows, which is quite precise. On the other side estimate >> for condition >>... WHERE (col_b BETWEEN 33 AND 33) >> is 1 row, which is very imprec

Re: [GENERAL] strange row count estimates with conditions on multiple column

2010-11-16 Thread Tom Lane
Tomas Vondra writes: > Estimate for condition >... WHERE (col_a BETWEEN 33 AND 33) > is about 10k rows, which is quite precise. On the other side estimate > for condition >... WHERE (col_b BETWEEN 33 AND 33) > is 1 row, which is very imprecise (actual value is about 5000). That's an artif

Re: [GENERAL] strange row count estimates with conditions on multiple column

2010-11-16 Thread Tomas Vondra
OK, thanks for the explanation. Cases (A), (B) and (D) are clear now. But I'm not sure about (C) ... Dne 17.11.2010 04:03, Tom Lane napsal(a): > Tomas Vondra writes: >> I'm not quite sure why (C) has an estimate of 1. > > It's smart enough to see that each of the clauses is a range constraint >

Re: [GENERAL] strange row count estimates with conditions on multiple column

2010-11-16 Thread Tom Lane
Tomas Vondra writes: > Results from (A) and (B) seem strange to me because AFAIK there are no > multi-column statistics available, and accoring to this thread > http://archives.postgresql.org/pgsql-hackers/2009-03/msg00052.php > the single-column estimates are not multiplied (which would be OK o

[GENERAL] strange row count estimates with conditions on multiple column

2010-11-16 Thread Tomas Vondra
Hi everyone, I've just noticed a strange behaviour when estimating row counts (I'm running 9.0.1). A small demonstration - let's create table with two columns, and fill it with data so that the columns are not independent: = --