[BUGS] Disk space is consumed by UPDATE query

Patrick Clery Fri, 01 Oct 2004 06:39:34 -0700

Your name               : Patrick Clery
Your email address      : [EMAIL PROTECTED]



System Configuration
---------------------
  Architecture (example: Intel Pentium)         : Intel
  Operating System (example: Linux 2.4.18)      :  FreeBSD 4.10-stable
  PostgreSQL version (example: PostgreSQL-8.0):   PostgreSQL-8.0
  Compiler used (example:  gcc 2.95.2)          : gcc 2.95.4


Please enter a FULL description of your problem:
------------------------------------------------

This query appears to enter an infinite loop and fill up my /usr partition 
(PGDATA=/usr/local/pgsql/data) at a rapid rate:

UPDATE people_locations
SET postalcode = foo.pcode, city_id = foo.cid
FROM (
    SELECT
        p.postalcode AS pcode,
        p.city_id AS cid,
        c.state_id AS sid
    FROM postalcodes p
    JOIN cities c USING (city_id)
) foo
WHERE foo.sid = state_id AND old_postalcode = foo.pcode

psql:/usr/local/www/beano/datingsite/sql/import_people.sql:363: ERROR:  could 
not write to hash-join temporary file: No space left on device


>From when the query is first run (somehow the disk space goes up initially):
 
$ while : ; do df -h /usr/; sleep 3; done
Filesystem    Size   Used  Avail Capacity  Mounted on
/dev/ad0s1g    15G   5.8G   7.8G    43%    /usr
Filesystem    Size   Used  Avail Capacity  Mounted on
/dev/ad0s1g    15G   5.2G   8.3G    39%    /usr
Filesystem    Size   Used  Avail Capacity  Mounted on
/dev/ad0s1g    15G   5.1G   8.4G    38%    /usr
Filesystem    Size   Used  Avail Capacity  Mounted on
/dev/ad0s1g    15G   5.2G   8.4G    38%    /usr
Filesystem    Size   Used  Avail Capacity  Mounted on
/dev/ad0s1g    15G   5.2G   8.3G    39%    /usr
Filesystem    Size   Used  Avail Capacity  Mounted on
/dev/ad0s1g    15G   5.3G   8.3G    39%    /usr
Filesystem    Size   Used  Avail Capacity  Mounted on
/dev/ad0s1g    15G   5.3G   8.3G    39%    /usr
Filesystem    Size   Used  Avail Capacity  Mounted on
/dev/ad0s1g    15G   5.4G   8.2G    40%    /usr
Filesystem    Size   Used  Avail Capacity  Mounted on
/dev/ad0s1g    15G   5.4G   8.2G    40%    /usr
Filesystem    Size   Used  Avail Capacity  Mounted on
/dev/ad0s1g    15G   5.5G   8.1G    40%    /usr
Filesystem    Size   Used  Avail Capacity  Mounted on
/dev/ad0s1g    15G   5.5G   8.1G    41%    /usr
Filesystem    Size   Used  Avail Capacity  Mounted on
/dev/ad0s1g    15G   5.6G   8.0G    41%    /usr
Filesystem    Size   Used  Avail Capacity  Mounted on
/dev/ad0s1g    15G   5.6G   8.0G    41%    /usr
Filesystem    Size   Used  Avail Capacity  Mounted on
/dev/ad0s1g    15G   5.7G   7.9G    42%    /usr
... and on and on until it reaches zero.


Here's the query plan:

                                                    QUERY PLAN                         
                             
----------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=18770.77..185690.90 rows=20626 width=140)
   Hash Cond: ((("outer".postalcode)::text = ("inner".old_postalcode)::text) 
AND ("outer".city_id = "inner".city_id))
   ->  Seq Scan on postalcodes p  (cost=0.00..14742.12 rows=825012 width=18)
   ->  Hash  (cost=9955.64..9955.64 rows=366625 width=126)
         ->  Merge Join  (cost=69.83..9955.64 rows=366625 width=126)
               Merge Cond: ("outer".state_id = "inner".state_id)
               ->  Index Scan using cities_state_id on cities c  
(cost=0.00..4203.13 rows=73325 width=8)
               ->  Sort  (cost=69.83..72.33 rows=1000 width=122)
                     Sort Key: people_locations.state_id
                     ->  Seq Scan on people_locations  (cost=0.00..20.00 
rows=1000 width=122)
(10 rows)


Here's the inner query by itself:
 
datingsite=> EXPLAIN ANALYZE     SELECT
datingsite->         p.postalcode AS pcode,
datingsite->         p.city_id AS cid,
datingsite->         c.state_id AS sid
datingsite->     FROM postalcodes p
datingsite->     JOIN cities c USING (city_id);
                                                          QUERY PLAN                   
                                        
-------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=2091.56..47451.98 rows=825012 width=22) (actual 
time=1132.994..16764.241 rows=825012 loops=1)
   Hash Cond: ("outer".city_id = "inner".city_id)
   ->  Seq Scan on postalcodes p  (cost=0.00..14742.12 rows=825012 width=18) 
(actual time=0.077..4657.842 rows=825012 loops=1)
   ->  Hash  (cost=1585.25..1585.25 rows=73325 width=8) (actual 
time=1131.010..1131.010 rows=0 loops=1)
         ->  Seq Scan on cities c  (cost=0.00..1585.25 rows=73325 width=8) 
(actual time=0.031..738.582 rows=73325 loops=1)
 Total runtime: 20475.610 ms
(6 rows)



Both tables are rather large:


datingsite=> select count(*) from people_locations ;
 count  
--------
 131266
(1 row)
 
Time: 2566.282 ms
datingsite=> select count(*) from postalcodes;      
 count  
--------
 825012
(1 row)
 
Time: 4246.360 ms



Here is the schema:



datingsite=> \d postalcodes;
           Table "public.postalcodes"
   Column   |         Type          | Modifiers 
------------+-----------------------+-----------
 postalcode | character varying(10) | not null
 city_id    | integer               | not null
Indexes:
    "postalcodes_pkey" PRIMARY KEY, btree (postalcode, city_id)
Foreign-key constraints:
    "postalcodes_city_id_fkey" FOREIGN KEY (city_id) REFERENCES 
cities(city_id) ON DELETE CASCADE
 
datingsite=> \d people_locations;
                           Table "public.people_locations"
     Column      |          Type          |                 Modifiers                 
-----------------+------------------------+-------------------------------------------
 person_id       | integer                | not null
 city_id         | integer                | not null default 0
 postalcode      | character varying(30)  | not null default 'N/A'::character 
varying
 postalcode_city | integer                | not null default 0
 country_iso     | integer                | not null default 0
 state_id        | integer                | not null default 0
 areacode        | integer                | not null default 0
 old_postalcode  | character varying(10)  | not null default ''::character 
varying
 old_cityname    | character varying(128) | not null default ''::character 
varying
Indexes:
    "people_locations_pkey" PRIMARY KEY, btree (person_id)
    "people_loc_postalcode" btree (postalcode)
Foreign-key constraints:
    "people_locations_person_id_fkey" FOREIGN KEY (person_id) REFERENCES 
people(person_id) ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED
    "people_locations_city_id_fkey" FOREIGN KEY (city_id) REFERENCES 
cities(city_id) ON UPDATE CASCADE ON DELETE RESTRICT
    "people_locations_country_iso_fkey" FOREIGN KEY (country_iso) REFERENCES 
countries(country_iso)
    "people_locations_state_id_fkey" FOREIGN KEY (state_id) REFERENCES 
states(state_id)


Please describe a way to repeat the problem.   Please try to provide a
concise reproducible example, if at all possible: 
----------------------------------------------------------------------
I can repeat the problem each time I run that query. If there is any further 
debugging info needed, I'm willing to provide it.




If you know how this problem might be fixed, list the solution below:
---------------------------------------------------------------------

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

[BUGS] Disk space is consumed by UPDATE query

Reply via email to