Greetings!

We are running postgres 11.9 (were running 11.7 prior to recent restart) on
a large db (10s of TB) with 5 or 6 tablespaces and 1000s of tables/indexes.

Within the past few days we have started to see a few queries running for
over 8 hours which we then attempt to terminate, but will not terminate.
These queries are also generating hundreds of thousands of tiny/empty temp
files.  In fact, before the restart there were over 23 million files in
pg_tmp which were removed.  We also have verified no server settings have
changed for at least a couple weeks, well before this issue started
happening only in the past few days.

Looking back awhile at the same query a couple weeks ago, we see this
(large file sizes):
2021-02-18 12:01:59.195
GMT,"foo_user","mydb",81780,"0.0.0.0:49926",602e5716.13f74,3,"SELECT",2021-02-18
12:01:26 GMT,164/1009590,0,LOG,00000,"temporary file: path
""pg_tblspc/16428/PG_11_201809051/pgsql_tmp/pgsql_tmp81780.0"", size
21299836",,,,,,"
2021-02-18 12:01:59.446
GMT,"foo_user","mydb",81780,"0.0.0.0:49926",602e5716.13f74,4,"SELECT",2021-02-18
12:01:26 GMT,164/1009590,0,LOG,00000,"temporary file: path
""pg_tblspc/16428/PG_11_201809051/pgsql_tmp/pgsql_tmp81780.5"", size
4138272",,,,,,"
2021-02-18 12:01:59.496
GMT,"foo_user","mydb",81780,"0.0.0.0:49926",602e5716.13f74,5,"SELECT",2021-02-18
12:01:26 GMT,164/1009590,0,LOG,00000,"temporary file: path
""pg_tblspc/16428/PG_11_201809051/pgsql_tmp/pgsql_tmp81780.1"", size
10706416",,,,,,"
2021-02-18 12:01:59.747
GMT,"foo_user","mydb",81780,"0.0.0.0:49926",602e5716.13f74,6,"SELECT",2021-02-18
12:01:26 GMT,164/1009590,0,LOG,00000,"temporary file: path
""pg_tblspc/16428/PG_11_201809051/pgsql_tmp/pgsql_tmp81780.3"", size
4150920",,,,,,"
2021-02-18 12:01:59.797
GMT,"foo_user","mydb",81780,"0.0.0.0:49926",602e5716.13f74,7,"SELECT",2021-02-18
12:01:26 GMT,164/1009590,0,LOG,00000,"temporary file: path
""pg_tblspc/16428/PG_11_201809051/pgsql_tmp/pgsql_tmp81780.2"", size
10660908",,,,,,"
2021-02-18 12:02:00.050
GMT,"foo_user","mydb",81780,"0.0.0.0:49926",602e5716.13f74,8,"SELECT",2021-02-18
12:01:26 GMT,164/1009590,0,LOG,00000,"temporary file: path
""pg_tblspc/16428/PG_11_201809051/pgsql_tmp/pgsql_tmp81780.4"", size
4132224",,,,,,"

But here is what we see presently (tiny files):
2021-03-05 21:30:52.712
GMT,"foo_user","mydb",67713,"0.0.0.0:48482",60424a1d.10881,73562,"SELECT",2021-03-05
15:11:25 GMT,80/2860,0,LOG,00000,"temporary file: path
""pg_tblspc/16428/PG_11_201809051/pgsql_tmp/pgsql_tmp67713.177717"", size
24",,,,,,"
2021-03-05 21:30:52.735
GMT,"foo_user","mydb",67713,"0.0.0.0:48482",60424a1d.10881,73563,"SELECT",2021-03-05
15:11:25 GMT,80/2860,0,LOG,00000,"temporary file: path
""pg_tblspc/16428/PG_11_201809051/pgsql_tmp/pgsql_tmp67713.990067"", size
92",,,,,,"
2021-03-05 21:30:52.950
GMT,"foo_user","mydb",67713,"0.0.0.0:48482",60424a1d.10881,73564,"SELECT",2021-03-05
15:11:25 GMT,80/2860,0,LOG,00000,"temporary file: path
""pg_tblspc/16428/PG_11_201809051/pgsql_tmp/pgsql_tmp67713.490"", size
24",,,,,,"
2021-03-05 21:30:53.072
GMT,"foo_user","mydb",67713,"0.0.0.0:48482",60424a1d.10881,73565,"SELECT",2021-03-05
15:11:25 GMT,80/2860,0,LOG,00000,"temporary file: path
""pg_tblspc/16428/PG_11_201809051/pgsql_tmp/pgsql_tmp67713.800016"", size
140",,,,,,"
2021-03-05 21:30:53.522
GMT,"foo_user","mydb",67713,"0.0.0.0:48482",60424a1d.10881,73566,"SELECT",2021-03-05
15:11:25 GMT,80/2860,0,LOG,00000,"temporary file: path
""pg_tblspc/16428/PG_11_201809051/pgsql_tmp/pgsql_tmp67713.500266"", size
48",,,,,,"

After the restart of the db and upgrade to 11.9, we see the same problem.

There are 2 separate queries that have had this behavior with seemingly
unrelated objects.  Here is the plan for one of them:


                        QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=2654520.75..2654520.76 rows=1 width=336)
   ->  Hash Left Join  (cost=1087907.49..2019675.12 rows=14107680 width=69)
         Hash Cond: (t1.team_stadium_id = ra.team_stadium_id)
         ->  Hash Left Join  (cost=1079740.68..1955870.37 rows=14107680
width=69)
               Hash Cond: (t1.team_stadium_id = frg.team_stadium_id)
               ->  Hash Right Join  (cost=1073614.70..1894916.60
rows=14107680 width=69)
                     Hash Cond: (sm_1.id = t1.id)
                     ->  Hash Join  (cost=47373.90..695007.85 rows=170011
width=4)
                           Hash Cond: (sm_1.id = l.id)
                           ->  Index Scan using
index_foo_fact_on_id_first_win on foo_fact sm_1  (cost=0.42..647177.59
rows=174010 width=12)
                                 Filter: (CASE WHEN
(basketball_renewal_date <= now()) THEN true ELSE false END AND (NOT CASE
WHEN (basketball_renewal_date <= (now() - '5 days'::interval)) THEN false
ELSE true END))
                           ->  Hash  (cost=34408.14..34408.14 rows=1037227
width=4)
                                 ->  Index Only Scan using
index_wins_on_team_covering on wins l  (cost=0.43..34408.14 rows=1037227
width=4)
                                       Filter: (team_status_id <> ALL
('{1,2}'::integer[]))
                     ->  Hash  (cost=684569.80..684569.80 rows=14107680
width=69)
                           ->  Seq Scan on bar_team_stadiums_info t1
 (cost=0.00..684569.80 rows=14107680 width=69)
               ->  Hash  (cost=3720.99..3720.99 rows=192399 width=4)
                     ->  Seq Scan on bar_team_stadiums_frg frg
 (cost=0.00..3720.99 rows=192399 width=4)
         ->  Hash  (cost=4749.14..4749.14 rows=273414 width=8)
               ->  Seq Scan on sports_team_stadiums ra  (cost=0.00..4749.14
rows=273414 width=8)
(20 rows)


Any ideas here would be much appreciated!!!

Thanks,
Jeremy

Reply via email to