On Fri, Nov 15, 2019 at 2:45 PM Jeff Janes <jeff.ja...@gmail.com> wrote:
> On Thu, Nov 14, 2019 at 5:20 PM Craig James <cja...@emolecules.com> wrote: > >> I'm completely baffled by this problem: I'm doing a delete that joins >> three modest-sized tables, and it gets completely stuck: 100% CPU use >> forever. Here's the query: >> >> >> Aggregate (cost=193.54..193.55 rows=1 width=8) >> -> Nested Loop Semi Join (cost=0.84..193.54 rows=1 width=0) >> Join Filter: (categories.id = c.id) >> -> Index Scan using i_categories_category_id on categories >> (cost=0.42..2.44 rows=1 width=4) >> Index Cond: (category_id = 23) >> -> Nested Loop Anti Join (cost=0.42..191.09 rows=1 width=4) >> Join Filter: (c.id = st.id) >> -> Index Scan using i_categories_category_id on categories >> c (cost=0.42..2.44 rows=1 width=4) >> Index Cond: (category_id = 23) >> -> Seq Scan on category_staging_23 st (cost=0.00..99.40 >> rows=7140 width=4) >> > > > If the estimates were correct, this shouldn't be slow. But how can it > screw up the estimate for this by much, when the conditions are so simple? > How many rows are there actually in categories where category_id=23? > I actually waited long enough for this to finish an "explain analyze". Here it is, preceded by stats about the table that I added to the program: 10000 items in table registry.category_staging_8 274602 items in table registry.categories 309398 items in table registry.smiles 10000 items in joined registry.category_staging_8 / registry.categories Aggregate (cost=274.90..274.91 rows=1 width=8) (actual time=7666916.832..7666916.832 rows=1 loops=1) -> Nested Loop Semi Join (cost=0.84..274.89 rows=1 width=0) (actual time=7666916.829..7666916.829 rows=0 loops=1) Join Filter: (categories.id = c.id) -> Index Scan using i_categories_category_id on categories (cost=0.42..2.44 rows=1 width=4) (actual time=0.015..6.192 rows=5000 loops=1) Index Cond: (category_id = 8) -> Nested Loop Anti Join (cost=0.42..272.44 rows=1 width=4) (actual time=1533.380..1533.380 rows=0 loops=5000) Join Filter: (c.id = st.id) Rows Removed by Join Filter: 12497500 -> Index Scan using i_categories_category_id on categories c (cost=0.42..2.44 rows=1 width=4) (actual time=0.017..1.927 rows=5000 loops=5000) Index Cond: (category_id = 8) -> Seq Scan on category_staging_8 st (cost=0.00..145.00 rows=10000 width=4) (actual time=0.003..0.153 rows=2500 loops=25000000) Planning time: 0.311 ms Execution time: 7666916.865 ms BTW, I'll note at this point that "analyze category_staging_8" prior to this query made no difference. > What do you see in `select * from pg_stats where tablename='categories' > and attname='category_id' \x\g\x`? > db=> select * from pg_stats where tablename='categories' and attname='category_id' \x\g\x; Expanded display is on. -[ RECORD 1 ]----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- schemaname | registry tablename | categories attname | category_id inherited | f null_frac | 0 avg_width | 4 n_distinct | 21 most_common_vals | {4,3,2,10,11,13,12,16,9,6,7,5,15,23,14,25,24,1,26,28,27} most_common_freqs | {0.2397,0.159933,0.0926667,0.0556,0.0555667,0.0546333,0.0525333,0.0439,0.0426667,0.0346333,0.0331,0.0302333,0.0288333,0.0240667,0.0224,0.0122333,0.011,0.0035,0.00233333,0.000366667,0.0001} histogram_bounds | correlation | -0.0200765 most_common_elems | most_common_elem_freqs | elem_count_histogram | -[ RECORD 2 ]----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- schemaname | test tablename | categories attname | category_id inherited | f null_frac | 0 avg_width | 4 n_distinct | 11 most_common_vals | {10,30,50,1,2,0,3,9,6,7,5} most_common_freqs | {0.132051,0.132051,0.132051,0.10641,0.0935897,0.0807692,0.0807692,0.0807692,0.0769231,0.0551282,0.0294872} histogram_bounds | correlation | -0.435298 most_common_elems | most_common_elem_freqs | > > Since it thinks the seq scan of category_staging_23 is only going to > happen once (at the bottom of two nested loops, but each executing just > once) it sees no benefit in hashing that table. Of course it is actually > happening a lot more than once. > Yeah, 25 million times to be exact. > > Cheers, > > Jeff > Thanks, Craig -- --------------------------------- Craig A. James Chief Technology Officer eMolecules, Inc. 3430 Carmel Mountain Road, Suite 250 San Diego, CA 92121 ---------------------------------