Re: Odd Row Estimates in Query Plan (rows=75)

Don Seiler Wed, 15 Aug 2018 13:04:28 -0700

Here's the query, obfuscated manually by me:

SELECT
        'Foo' as system_function,
        stores.name as store,
        lt.owner,
        lt.minute_of_day,
        lt.records
        FROM
        foo.stores
        LEFT OUTER JOIN
            (SELECT
                lts.store_pkey,
                lts.owner,
                date_trunc('minute', lts.date_gifted) as minute_of_day,
                count(*) as records
            FROM foo.gifts lts
            WHERE
                lts.date_added  > '2017-07-14 11:13:05'
            AND lts.date_added  < '2017-08-13 14:14:21'
            AND lts.date_gifted >= '2017-08-13 11:13:05'
            AND lts.date_gifted <  '2017-08-13 14:14:21'
            GROUP BY 1,2,3
            ORDER BY 1
            ) lt ON lt.store_pkey = stores.pkey
        WHERE lt.records IS NOT NULL;


The foo.gifts table is pretty much the core table of our database. It's big
and very active. There is an index on date_added but not yet on
date_gifted.

I'm working to re-write the query while the dev sees if we even need this
query anymore.

On Wed, Aug 15, 2018 at 2:39 PM, Adrian Klaver <adrian.kla...@aklaver.com>
wrote:

> On 08/15/2018 12:31 PM, Don Seiler wrote:
>
>> PostgreSQL 9.6.6 on CentOS.
>>
>> We have a report query that has gone from maybe a few seconds to run to a
>> few minutes to run since mid-July. Looking at the output of EXPLAIN
>> ANALYZE, the row count estimates are way off, even though this table was
>> just analyzed a day or so ago. What's more bizarre to me is that the row
>> count esimate is *always* 75 for every node of the plan, where the actual
>> rows is in the hundreds or thousands. This table is one of the busiest
>> tables in our production database (many inserts and updates). It is
>> autovacuumed and autoanalyzed a few times per week, although I'm looking to
>> change it to a nightly manual schedule to avoid daytime autovacuums.
>>
>> Hash Join  (cost=1869142.34..1869146.15 rows=75 width=88) (actual
>> time=179877.869..179878.011 rows=759 loops=1)
>>     Hash Cond: (stores.pkey = lt.store_pkey)
>>     Buffers: shared hit=1654593 read=331897 dirtied=249
>>     ->  Seq Scan on stores  (cost=0.00..2.77 rows=77 width=22) (actual
>> time=0.007..0.023 rows=78 loops=1)
>>           Buffers: shared hit=2
>>     ->  Hash  (cost=1869141.40..1869141.40 rows=75 width=50) (actual
>> time=179877.847..179877.847 rows=759 loops=1)
>>           Buckets: 1024  Batches: 1  Memory Usage: 73kB
>>           Buffers: shared hit=1654591 read=331897 dirtied=249
>>           ->  Subquery Scan on lt  (cost=1869138.59..1869141.40 rows=75
>> width=50) (actual time=179875.976..179877.697 rows=759 loops=1)
>>                 Buffers: shared hit=1654591 read=331897 dirtied=249
>>                 ->  GroupAggregate  (cost=1869138.59..1869140.65 rows=75
>> width=50) (actual time=179875.976..179877.606 rows=759 loops=1)
>>                       Group Key: lts.store_pkey, lts.owner,
>> (date_trunc('minute'::text, lts.date_gifted))
>>                       Filter: (count(*) IS NOT NULL)
>>                       Buffers: shared hit=1654591 read=331897 dirtied=249
>>                       ->  Sort  (cost=1869138.59..1869138.78 rows=75
>> width=42) (actual time=179875.961..179876.470 rows=6731 loops=1)
>>                             Sort Key: lts.store_pkey,
>> lts.entry_source_owner, (date_trunc('minute'::text, lts.date_gifted))
>>                             Sort Method: quicksort  Memory: 757kB
>>                             Buffers: shared hit=1654591 read=331897
>> dirtied=249
>>                             ->  Index Scan using gifts_date_added on
>> gifts lts  (cost=0.56..1869136.25 rows=75 width=42) (actual
>> time=190.657..179870.165 rows=6731 loops=1)
>>                                   Index Cond: ((date_added > '2018-07-14
>> 11:13:05'::timestamp without time zone) AND (date_added < '2018-08-13
>> 14:14:21'::timestamp without time zone))
>>                                   Filter: ((date_gifted >= '2018-08-13
>> 11:13:05'::timestamp without time zone) AND (date_gifted < '2018-08-13
>> 14:14:21'::timestamp without time zone))
>>                                   Rows Removed by Filter: 938197
>>                                   Buffers: shared hit=1654591 read=331897
>> dirtied=249
>>   Planning time: 0.426 ms
>>   Execution time: 179893.894 ms
>>
>> I don't have a version of this query from prior to this summer, but
>> getting explain plan for older data from older sandboxes show a similar
>> plan.
>>
>
> I don't have an answer, just a question:
>
> Can you provide the actual query and the table schema?
>
>
>> Sidenote: I am suggesting that an index be added on the date_gifted field
>> as that is far more selective and avoids throwing rows away. However I'm
>> very interested in why every node dealing with the gifts table thinks
>> rows=75 when the actual is much, much higher. And 75 seems like too round
>> of a number to be random?
>>
>> --
>> Don Seiler
>> www.seiler.us <http://www.seiler.us>
>>
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>



-- 
Don Seiler
www.seiler.us

Re: Odd Row Estimates in Query Plan (rows=75)

Reply via email to