Re: Simple DELETE on modest-size table runs 100% CPU forever

Craig James Fri, 15 Nov 2019 16:27:30 -0800

On Fri, Nov 15, 2019 at 2:45 PM Jeff Janes <jeff.ja...@gmail.com> wrote:


> On Thu, Nov 14, 2019 at 5:20 PM Craig James <cja...@emolecules.com> wrote:
>
>> I'm completely baffled by this problem: I'm doing a delete that joins
>> three modest-sized tables, and it gets completely stuck: 100% CPU use
>> forever.  Here's the query:
>>
>>
>> Aggregate  (cost=193.54..193.55 rows=1 width=8)
>>   ->  Nested Loop Semi Join  (cost=0.84..193.54 rows=1 width=0)
>>         Join Filter: (categories.id = c.id)
>>         ->  Index Scan using i_categories_category_id on categories
>>  (cost=0.42..2.44 rows=1 width=4)
>>               Index Cond: (category_id = 23)
>>         ->  Nested Loop Anti Join  (cost=0.42..191.09 rows=1 width=4)
>>               Join Filter: (c.id = st.id)
>>               ->  Index Scan using i_categories_category_id on categories
>> c  (cost=0.42..2.44 rows=1 width=4)
>>                     Index Cond: (category_id = 23)
>>               ->  Seq Scan on category_staging_23 st  (cost=0.00..99.40
>> rows=7140 width=4)
>>
>
>
> If the estimates were correct, this shouldn't be slow.  But how can it
> screw up the estimate for this by much, when the conditions are so simple?
> How many rows are there actually in categories where category_id=23?
>

I actually waited long enough for this to finish an "explain analyze". Here
it is, preceded by stats about the table that I added to the program:

10000 items in table registry.category_staging_8
274602 items in table registry.categories
309398 items in table registry.smiles
10000 items in joined registry.category_staging_8 / registry.categories
Aggregate  (cost=274.90..274.91 rows=1 width=8) (actual
time=7666916.832..7666916.832 rows=1 loops=1)
  ->  Nested Loop Semi Join  (cost=0.84..274.89 rows=1 width=0) (actual
time=7666916.829..7666916.829 rows=0 loops=1)
        Join Filter: (categories.id = c.id)
        ->  Index Scan using i_categories_category_id on categories
 (cost=0.42..2.44 rows=1 width=4) (actual time=0.015..6.192 rows=5000
loops=1)
              Index Cond: (category_id = 8)
        ->  Nested Loop Anti Join  (cost=0.42..272.44 rows=1 width=4)
(actual time=1533.380..1533.380 rows=0 loops=5000)
              Join Filter: (c.id = st.id)
              Rows Removed by Join Filter: 12497500
              ->  Index Scan using i_categories_category_id on categories c
 (cost=0.42..2.44 rows=1 width=4) (actual time=0.017..1.927 rows=5000
loops=5000)
                    Index Cond: (category_id = 8)
              ->  Seq Scan on category_staging_8 st  (cost=0.00..145.00
rows=10000 width=4) (actual time=0.003..0.153 rows=2500 loops=25000000)
Planning time: 0.311 ms
Execution time: 7666916.865 ms

BTW, I'll note at this point that "analyze category_staging_8" prior to
this query made no difference.


> What do you see in `select * from pg_stats where tablename='categories'
> and attname='category_id' \x\g\x`?
>

db=> select * from pg_stats where tablename='categories' and
attname='category_id' \x\g\x;
Expanded display is on.
-[ RECORD 1
]----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
schemaname             | registry
tablename              | categories
attname                | category_id
inherited              | f
null_frac              | 0
avg_width              | 4
n_distinct             | 21
most_common_vals       |
{4,3,2,10,11,13,12,16,9,6,7,5,15,23,14,25,24,1,26,28,27}
most_common_freqs      |
{0.2397,0.159933,0.0926667,0.0556,0.0555667,0.0546333,0.0525333,0.0439,0.0426667,0.0346333,0.0331,0.0302333,0.0288333,0.0240667,0.0224,0.0122333,0.011,0.0035,0.00233333,0.000366667,0.0001}
histogram_bounds       |
correlation            | -0.0200765
most_common_elems      |
most_common_elem_freqs |
elem_count_histogram   |
-[ RECORD 2
]----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
schemaname             | test
tablename              | categories
attname                | category_id
inherited              | f
null_frac              | 0
avg_width              | 4
n_distinct             | 11
most_common_vals       | {10,30,50,1,2,0,3,9,6,7,5}
most_common_freqs      |
{0.132051,0.132051,0.132051,0.10641,0.0935897,0.0807692,0.0807692,0.0807692,0.0769231,0.0551282,0.0294872}
histogram_bounds       |
correlation            | -0.435298
most_common_elems      |
most_common_elem_freqs |


>
> Since it thinks the seq scan of  category_staging_23 is only going to
> happen once (at the bottom of two nested loops, but each executing just
> once) it sees no benefit in hashing that table.  Of course it is actually
> happening a lot more than once.
>

Yeah, 25 million times to be exact.


>
> Cheers,
>
> Jeff
>

Thanks,
Craig


-- 
---------------------------------
Craig A. James
Chief Technology Officer
eMolecules, Inc.
3430 Carmel Mountain Road, Suite 250
San Diego, CA 92121
---------------------------------

Re: Simple DELETE on modest-size table runs 100% CPU forever

Reply via email to