Problem solved ... see below. Thanks everyone for your suggestions and insights!
On Sat, Nov 16, 2019 at 7:16 AM Jeff Janes <jeff.ja...@gmail.com> wrote: > On Fri, Nov 15, 2019 at 7:27 PM Craig James <cja...@emolecules.com> wrote: > >> On Fri, Nov 15, 2019 at 2:45 PM Jeff Janes <jeff.ja...@gmail.com> wrote: >> BTW, I'll note at this point that "analyze category_staging_8" prior to >> this query made no difference. >> > > Isn't that the wrong table to have analyzed? The offender here is > "categories", not "category_staging_8". Is this some sort of inheritance > situation? > > >> >>> What do you see in `select * from pg_stats where tablename='categories' >>> and attname='category_id' \x\g\x`? >>> >> >> db=> select * from pg_stats where tablename='categories' and >> attname='category_id' \x\g\x; >> Expanded display is on. >> > > >> ... >> > n_distinct | 21 >> most_common_vals | >> {4,3,2,10,11,13,12,16,9,6,7,5,15,23,14,25,24,1,26,28,27} >> most_common_freqs | >> {0.2397,0.159933,0.0926667,0.0556,0.0555667,0.0546333,0.0525333,0.0439,0.0426667,0.0346333,0.0331,0.0302333,0.0288333,0.0240667,0.0224,0.0122333,0.011,0.0035,0.00233333,0.000366667,0.0001} >> > > There is a path in the analyze code where if the least-seen value in the > sample was seen more than once (i.e. no value was seen exactly once) then > it assumes that the seen values are all the values that exist. I think the > logic behind that is dubious. I think it is pretty clear that that is > kicking in here. But why? I think the simple answer is that you analyzed > the wrong table, and the statistics shown here might be accurate for some > time in the past but are no longer accurate. It is hard to see how a value > present 5000 times in a table of 274602 rows could have evaded sampling if > they were present at the time the sample was done. > As I mentioned in a reply to Andreas, I also added an "analyze ..." to the other two tables as an experiment. It made no difference. However ... Your comment about missing 5000 values solved the problem: those values were only inserted in the previous SQL statement, inside of a transaction. The code is reconciling two collections across two different servers: First it inserts all new values, then it deletes obsolete values. So the "select ..." in question is including the very 5000 rows that were just inserted. I added an "analyze" between the insert and the delete. Instant fix. It also solves one other mystery: This query only caused problems on the small test system, and has been working well on a production database with about 100x more data. In production, each "category" is already populated with a significant amount of data. The production system already has good statistics, so this one insert/delete doesn't change the statistics. > Cheers, > > Jeff > Thanks! Craig