Re: Query hitting empty tables taking 48 minutes

Robert Creager Thu, 07 Jun 2018 15:23:21 -0700

Re-sending from a subscribed address (sigh).

On Jun 7, 2018, at 4:18 PM, Robert <dom...@logicalchaos.org> wrote:



> On Jun 7, 2018, at 2:15 PM, Laurenz Albe <laurenz.a...@cybertec.at 
> <mailto:laurenz.a...@cybertec.at>> wrote:
> 
> Robert Creager wrote:
>> I have a system running FreeBSD 11.1-STABLE, PostgreSQL 9.6.8,Java OpenJDK 
>> 1.8.0_131,
>> jdbc 9.3-1104-jdbc41 which is exhibiting very bizarre behavior.
>> A query is executing against a couple of tables that have 1 and 0 records in 
>> them.
>> ds3.job_entry has 0 rows, ds3.blob has 1 row.
>> If I execute the query manually via command line, it executes fine.  There 
>> are no other active queries, no locks.
>> The system is idle, in our Dev Test group, so I can leave it this way for a 
>> bit of time.
>> The general software setup is in production and I’ve seen nothing like this 
>> before.
>> Even a system with 300M ds3.blob entries executes this query fine.
>> 
>> Jun  7 17:24:21 blackpearl postgres[10670]: [7737-1] 
>> db=tapesystem,user=Administrator,app=[unknown],client=127.0.0.1 LOG:  
>> duration: 2903612.206 ms  execute fetch from S_2037436/C_2037437: SELECT * 
>> FROM ds3.blob WHERE EXISTS (SELECT * FROM ds3.job_entry WHERE blob_id = 
>> ds3.blob.id <http://ds3.blob.id/> AND (job_id = $1))
>> 
>> tapesystem=# explain analyze SELECT * FROM ds3.blob WHERE EXISTS (SELECT * 
>> FROM ds3.job_entry WHERE blob_id = ds3.blob.id <http://ds3.blob.id/> AND 
>> (job_id = 'b51357cd-e07a-4c87-a50b-999c347a5c71'));                       
>>                                                              QUERY PLAN
>> ---------------------------------------------------------------------------------------------------------------------------------------
>>                                                                 
>> Nested Loop  (cost=0.84..6.89 rows=1 width=77) (actual time=0.044..0.044 
>> rows=0 loops=1)
>>   ->  Index Scan using job_entry_job_id_idx on job_entry  (cost=0.42..2.44 
>> rows=1 width=16) (actual time=0.042..0.042 rows=0 loops=1)
>>         Index Cond: (job_id = 'b51357cd-e07a-4c87-a50b-999c347a5c71'::uuid)
>>   ->  Index Scan using blob_pkey on blob  (cost=0.42..4.44 rows=1 width=77) 
>> (never executed)
>>         Index Cond: (id = job_entry.blob_id)                                 
>>  
>> Planning time: 0.388 ms
>> Execution time: 0.118 ms                                                     
>>                                                                              
>>                               
>> 
>> 
>> pid  client_port     runtime query_start     datname state   query   usename
>> 10670        11211   0 years 0 mons 0 days 0 hours 43 mins 28.852273 secs    
>> 2018-06-07 17:24:22.026384      tapesystem      active  SELECT * FROM 
>> ds3.blob WHERE EXISTS (SELECT * FROM ds3.job_entry WHERE blob_id = 
>> ds3.blob.id <http://ds3.blob.id/> AND (job_id = $1))   Administrator
>> 
>> 
>> From the system with 300M ds3.blob entries, which works fine (along with 
>> every other system in house):
>> 
>> QUERY PLAN
>> Nested Loop  (cost=0.57..738052.90 rows=164379 width=75) (actual 
>> time=1.001..1947.029 rows=50000 loops=1)
>>  ->  Seq Scan on job_entry  (cost=0.00..10039.50 rows=164379 width=16) 
>> (actual time=0.871..56.442 rows=50000 loops=1)
>>        Filter: (job_id = 'ef27d2fa-2727-424e-8f44-da9e33a5ca05'::uuid)
>>        Rows Removed by Filter: 60001
>>  ->  Index Scan using blob_pkey on blob  (cost=0.57..4.42 rows=1 width=75) 
>> (actual time=0.037..0.037 rows=1 loops=50000)
>>        Index Cond: (id = job_entry.blob_id)
>> Planning time: 6.330 ms
>> Execution time: 1951.564 ms
> 
> The JDBC query is using a cursor since you are using setFetchSize(10000).
> 
> I can see two reasons why the plan might be different:
> 
> 1. It is a prepared statement that has been executed often enough
>   for the generic plan to kick in, and the generic plan is bad.

Tom brought up that, see my response to him.  It is a bad plan, but on a table 
with no entries.

> 2. PostgreSQL chooses a fast startup plan because a cursor is used,
>   and that plan performs much worse.

The parameterized plan Tom had me look at starts with sequence scans, which 
would be bad on the job_entry table (topping out at ~9M entries worst case), 
horrendous on the blob table (open ended size, testing with 300M entries on one 
of our systems).

> To see if 1. causes the problem, you could set
>  m_statement.setPrepareThreshold(0);
> and see if that makes a difference.

If I can keep this machine and spin some code, I could try this.  And, if this 
is a JDBC/app problem, re-starting the app would ‘fix’ the problem I’m guessing.

> To test if 2. has an influence, try setting cursor_tuple_fraction to 1.0
> and see if it makes a difference.
> 
> You can capture the execution plan of the bad statement by using auto_explain,
> that would certainly shed more light on the problem.

I’ll look at enabling that setting.  I’d be unable to do it easily without 
restarting postgres (via changing the postgresql.conf file) without spinning 
new code (possible, not ideal).

Best,
Robert

Re: Query hitting empty tables taking 48 minutes

Reply via email to