Select count(*) on a 2B Rows Tables Takes ~20 Hours

Fd Habash Thu, 13 Sep 2018 10:34:29 -0700

Based on my research in the forums and Google , it is described in multiple 
places that ‘select count(*)’ is expected to be slow in Postgres because of the 
MVCC controls imposed upon the query leading a table scan. Also, the elapsed 
time increase linearly with table size.


However, I do not know if elapsed time I’m getting is to be expected. 

Table reltuples in pg_class = 2,266,649,344 (pretty close)
Query = select count(*) from jim.sttyations ;
Elapsed time (ET) = 18.5 hrs

This is an Aurora cluster running on r4.2xlarge (8 vCPU, 61g). CPU usage during 
count run hovers around 20% with 20g of freeable memory. 

Is this ET expected? If not, what could be slowing it down? I’m currently 
running explain analyze and I’ll share the final output when done. 

I’m familiar with the ideas listed here 
https://www.citusdata.com/blog/2016/10/12/count-performance/ 

Table "jim.sttyations"
      Column       |           Type           |         Modifiers          | 
Storage  | Stats target | Description 
-------------------+--------------------------+----------------------------+----------+--------------+-------------
 stty_id            | bigint                   | not null                   | 
plain    |              | 
 stty_hitlist_line  | text                     | not null                   | 
extended |              | 
 stty_status        | text                     | not null default 'Y'::text | 
extended |              | 
 stty_status_date   | timestamp with time zone | not null                   | 
plain    |              | 
 vs_number         | integer                  | not null                   | 
plain    |              | 
 stty_date_created  | timestamp with time zone | not null                   | 
plain    |              | 
 stty_stty_id        | bigint                   |                            | 
plain    |              | 
 stty_position      | bigint                   |                            | 
plain    |              | 
 mstty_id           | bigint                   |                            | 
plain    |              | 
 vsr_number        | integer                  |                            | 
plain    |              | 
 stty_date_modified | timestamp with time zone |                            | 
plain    |              | 
 stty_stored        | text                     | not null default 'N'::text | 
extended |              | 
 stty_sequence      | text                     |                            | 
extended |              | 
 stty_hash          | text                     |                            | 
extended |              | 
Indexes:
    "stty_pk" PRIMARY KEY, btree (stty_id)
    "stty_indx_fk01" btree (stty_stty_id)
    "stty_indx_fk03" btree (vsr_number)
    "stty_indx_fk04" btree (vs_number)
    "stty_indx_pr01" btree (mstty_id, stty_id)
Check constraints:
    "stty_cnst_ck01" CHECK (stty_status = ANY (ARRAY['Y'::text, 'N'::text]))
    "stty_cnst_ck02" CHECK (stty_stored = ANY (ARRAY['N'::text, 'Y'::text]))
Foreign-key constraints:
    "stty_cnst_fk01" FOREIGN KEY (stty_stty_id) REFERENCES sttyations(stty_id) 
NOT VALID
    "stty_cnst_fk02" FOREIGN KEY (mstty_id) REFERENCES 
master_sttyations(mstty_id)
    "stty_cnst_fk03" FOREIGN KEY (vsr_number) REFERENCES 
valid_status_reasons(vsr_number)

----------------
Thank you


refpep-> select count(*) from jim.sttyations; 
                                                    QUERY PLAN                  
                                  
------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=73451291.77..73451291.78 rows=1 width=8)
   Output: count(*)
   ->  Index Only Scan using stty_indx_fk03 on jim.sttyations  
(cost=0.58..67784668.41 rows=2266649344 width=0)
         Output: vsr_number
(4 rows)

Select count(*) on a 2B Rows Tables Takes ~20 Hours

Reply via email to