Problem solved ... see below. Thanks everyone for your suggestions and
insights!

On Sat, Nov 16, 2019 at 7:16 AM Jeff Janes <jeff.ja...@gmail.com> wrote:

> On Fri, Nov 15, 2019 at 7:27 PM Craig James <cja...@emolecules.com> wrote:
>
>> On Fri, Nov 15, 2019 at 2:45 PM Jeff Janes <jeff.ja...@gmail.com> wrote:
>> BTW, I'll note at this point that "analyze category_staging_8" prior to
>> this query made no difference.
>>
>
> Isn't that the wrong table to have analyzed? The offender here is
> "categories", not  "category_staging_8".  Is this some sort of inheritance
> situation?
>
>
>>
>>> What do you see in `select * from pg_stats where tablename='categories'
>>> and attname='category_id' \x\g\x`?
>>>
>>
>> db=> select * from pg_stats where tablename='categories' and
>> attname='category_id' \x\g\x;
>> Expanded display is on.
>>
>
>
>> ...
>>
> n_distinct             | 21
>> most_common_vals       |
>> {4,3,2,10,11,13,12,16,9,6,7,5,15,23,14,25,24,1,26,28,27}
>> most_common_freqs      |
>> {0.2397,0.159933,0.0926667,0.0556,0.0555667,0.0546333,0.0525333,0.0439,0.0426667,0.0346333,0.0331,0.0302333,0.0288333,0.0240667,0.0224,0.0122333,0.011,0.0035,0.00233333,0.000366667,0.0001}
>>
>
> There is a path in the analyze code where if the least-seen value in the
> sample was seen more than once (i.e. no value was seen exactly once) then
> it assumes that the seen values are all the values that exist.  I think the
> logic behind that is dubious.  I think it is pretty clear that that is
> kicking in here.  But why?  I think the simple answer is that you analyzed
> the wrong table, and the statistics shown here might be accurate for some
> time in the past but are no longer accurate.  It is hard to see how a value
> present 5000 times in a table of 274602 rows could have evaded sampling if
> they were present at the time the sample was done.
>

As I mentioned in a reply to Andreas, I also added an "analyze ..." to the
other two tables as an experiment. It made no difference. However ...

Your comment about missing 5000 values solved the problem: those values
were only inserted in the previous SQL statement, inside of a transaction.
The code is reconciling two collections across two different servers: First
it inserts all new values, then it deletes obsolete values. So the "select
..." in question is including the very 5000 rows that were just inserted.

I added an "analyze" between the insert and the delete. Instant fix.

It also solves one other mystery: This query only caused problems on the
small test system, and has been working well on a production database with
about 100x more data. In production, each "category" is already populated
with a significant amount of data. The production system already has good
statistics, so this one insert/delete doesn't change the statistics.


> Cheers,
>
> Jeff
>

Thanks!
Craig

Reply via email to