[BUGS] BUG #6763: Severe memory leak with arrays and hstore
The following bug has been logged on the website: Bug reference: 6763 Logged by: Luben Karavelov Email address: karave...@mail.bg PostgreSQL version: 9.1.4 Operating system: Debian Linux Description: I was trying to migrate a big ER table (user preferences) into a new table. Old able: old_prefs ( user_id integer NOT NULL, namevarchar NOT NULL, value varchar NOT NULL ); New table: preferences ( user_id integer PRIMARY KEY, prefs hstore; ); The query I have tried to use is: INSERT INTO preferences SELECT user_id,hstore(array_agg(name), array_agg(value)) FROM old_prefs GROUP BY user_id; But the postgres backend consumed all the available memory (6G free + 4G swap) and finally was killed by the kernel. Its 8G RAM machine and here are the memory options from the config: shared_buffers = 1800MB temp_buffers = 16MB work_mem = 64MB maintenance_work_mem = 256MB max_stack_depth = 2MB Finally I have managed to migrate it in batches of 100-200k user ids and disconnecting after each query in order to free the backend and leaked memory. Best regards Luben -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #6763: Severe memory leak with arrays and hstore
r_id rollback; ROLLBACK Ops.. have to cleanup the old_prefs, some users were deleted in the meantime: pg=> delete from old_prefs where user_id not in (select user_id from users); DELETE 7 pg=> commit; COMMIT USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND postgres 19121 1.4 26.4 4469520 2157588 ? Ss 15:23 0:29 postgres: pg pg 10.0.2.71(51734) idle Near 1G grow on rolled back transaction pg=> INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), array_agg(value)) FROM old_prefs WHERE user_id>=340 AND user_id commit; COMMIT USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND postgres 19121 1.7 26.6 4479944 2180536 ? Ss 15:23 0:35 postgres: pg pg 10.0.2.71(51734) idle Another batch, bigger this time: pg=> select count(*) from old_prefs where user_id>=380 and user_id INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), array_agg(value)) FROM old_prefs WHERE user_id>=380 AND user_id commit; COMMIT USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND postgres 19121 1.9 33.1 5238968 2710756 ? Ss 15:23 0:45 postgres: pg pg 10.0.2.71(51734) idle Another big batch: pg=> select count(*) from old_prefs where user_id>=420 and user_id INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), array_agg(value)) FROM old_prefs WHERE user_id>=420 AND user_id commit; COMMIT USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND postgres 19121 2.2 35.7 5438412 2918720 ? Ss 15:23 0:55 postgres: pg pg 10.0.2.71(51734) idle Now a smaller batch: pg=> select count(*) from old_prefs where user_id>=440 and user_id INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), array_agg(value)) FROM old_prefs WHERE user_id>=440 AND user_id commit; COMMIT RSS keeps growing: USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND postgres 19121 2.2 36.0 5438412 2943784 ? Ss 15:23 1:00 postgres: pg pg 10.0.2.71(51734) idle Lets see if a bigger batch will pass: pg=> select count(*) from old_prefs where user_id>=450; count 631911 (1 row) pg=> INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), array_agg(value)) FROM old_prefs WHERE user_id>=450 GROUP BY user_id; INSERT 0 296541 pg=> commit; COMMIT Ok, this time it passed, but the backend is over 4G USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND postgres 19121 2.2 50.0 7227968 4088928 ? Ss 15:23 1:17 postgres: pg pg 10.0.2.71(51734) idle Some observations: 1. Backend does not free allocated memory between transactions. 2. Rolled back transactions also leak memory. 3. Leaked memory is not linear to work done - 2 transactions with 200k keys will leak less than 1 transaction with 400k keys Regarding Tom's question: The old_prefs does not fit in work_mem but is quite small regarding the total RAM. Isn't the "work_mem" a limit of the memory each backend could allocate for sorting, grouping and aggregation? My understanding is that bigger allocation will overflow to disk and will not kill the server. I could be wrong though. Thanks in advance and best regards -- Luben Karavelov
Re: [BUGS] BUG #6763: Severe memory leak with arrays and hstore
y_agg(name), array_agg(value)) FROM old_prefs WHERE user_id>=160 AND user_id<240 GROUP BY user_id; INSERT 0 55447 pg=> commit; COMMIT USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND postgres 19121 0.9 11.5 3277412 945560 ? Ss 15:23 0:15 postgres: pg pg 10.0.2.71(51734) idle Another batch: pg=> select count(*) from old_prefs where user_id>=240 and user_id<340; count 200614 (1 row) Time: 83,409 ms pg=> INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), array_agg(value)) FROM old_prefs WHERE user_id>=240 AND user_id<340 GROUP BY user_id; INSERT 0 87940 pg=> commit; COMMIT USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND postgres 19121 1.2 16.2 3736968 1331796 ? Ss 15:23 0:20 postgres: pg pg 10.0.2.71(51734) idle Another batch: pg => select count(*) from old_prefs where user_id>=340 and user_id<380; count 161390 (1 row) pg=> INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), array_agg(value)) FROM old_prefs WHERE user_id>=340 AND user_id<380 GROUP BY user_id; ERROR: insert or update on table "new_preferences" violates foreign key constraint "new_preferences_user_id_fkey" DETAIL: Key (user_id)=(3615131) is not present in table "users". pg=> rollback; ROLLBACK Ops.. have to cleanup the old_prefs, some users were deleted in the meantime: pg=> delete from old_prefs where user_id not in (select user_id from users); DELETE 7 pg=> commit; COMMIT USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND postgres 19121 1.4 26.4 4469520 2157588 ? Ss 15:23 0:29 postgres: pg pg 10.0.2.71(51734) idle Near 1G grow on rolled back transaction pg=> INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), array_agg(value)) FROM old_prefs WHERE user_id>=340 AND user_id<380 GROUP BY user_id; INSERT 0 131803 pg=> commit; COMMIT USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND postgres 19121 1.7 26.6 4479944 2180536 ? Ss 15:23 0:35 postgres: pg pg 10.0.2.71(51734) idle Another batch, bigger this time: pg=> select count(*) from old_prefs where user_id>=380 and user_id<420; count 327374 (1 row) pg=> INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), array_agg(value)) FROM old_prefs WHERE user_id>=380 AND user_id<420 GROUP BY user_id; INSERT 0 177044 pg=> commit; COMMIT USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND postgres 19121 1.9 33.1 5238968 2710756 ? Ss 15:23 0:45 postgres: pg pg 10.0.2.71(51734) idle Another big batch: pg=> select count(*) from old_prefs where user_id>=420 and user_id<440; count 375352 (1 row) pg=> INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), array_agg(value)) FROM old_prefs WHERE user_id>=420 AND user_id<440 GROUP BY user_id; INSERT 0 189095 pg=> commit; COMMIT USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND postgres 19121 2.2 35.7 5438412 2918720 ? Ss 15:23 0:55 postgres: pg pg 10.0.2.71(51734) idle Now a smaller batch: pg=> select count(*) from old_prefs where user_id>=440 and user_id<450; count 219249 (1 row) pg=> INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), array_agg(value)) FROM old_prefs WHERE user_id>=440 AND user_id<450 GROUP BY user_id; INSERT 0 99782 pg=> commit; COMMIT RSS keeps growing: USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND postgres 19121 2.2 36.0 5438412 2943784 ? Ss 15:23 1:00 postgres: pg pg 10.0.2.71(51734) idle Lets see if a bigger batch will pass: pg=> select count(*) from old_prefs where user_id>=450; count 631911 (1 row) pg=> INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), array_agg(value)) FROM old_prefs WHERE user_id>=450 GROUP BY user_id; INSERT 0 296541 pg=> commit; COMMIT Ok, this time it passed, but the backend is over 4G USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND postgres 19121 2.2 50.0 7227968 4088928 ? Ss 15:23 1:17 postgres: pg pg 10.0.2.71(51734) idle Some observations: 1. Backend does not free allocated memory between transactions. 2. Rolled back transactions also leak memory. 3. Leaked memory is not linear to work done - 2 transactions with 200k keys will leak less than 1 transaction with 400k keys Regarding Tom's question: The old_prefs does not fit in work_mem but is quite small regarding the total RAM. Isn't the "work_mem" a limit of the memory each backend could allocate for sorting, grouping and aggregation? My understanding is that bigger allocation will overflow to disk and will not kill the server. I could be wrong though. Thanks in advance and best regards -- Luben Karavelov -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #6763: Severe memory leak with arrays and hstore
On Jul 27, 2012, at 8:47 AM, Tom Lane wrote: > Craig Ringer writes: >> OK, it's certainly leaking, but not in the same drastic way I was able >> to reproduce manually a couple of times earlier. Self-contained test >> case attached. > > Using HEAD with stock parameters, I don't see any significant change in > allocated address space (VSZ): it sits right around 170MB. The reported > resident set size (RSS) starts from very little and rises to about > 140MB, but I think that's just an artifact of the process touching more > and more of the shared-buffers array as it runs. The actual backend > memory consumption seems to be just a few meg. > > I can get it to blow out memory if I set work_mem large enough to > persuade the planner to use hash aggregation (whereupon it tries to run > all the array_agg aggregates in parallel). However, that requires > work_mem set to a couple of GB, and I don't think it's really a bug when > the backend goes ahead and uses a couple of GB after I told it it could. > > It's possible that the OP's problem boiled down to the planner making > a drastic underestimate of the number of GROUP BY groups, which could > mislead it into applying hash aggregation when there's not room; or > if the space used per aggregate was a lot more than the 8K that the > planner assumes when dealing with array_agg. But neither of those > errors seems to be happening in this example case. > > regards, tom lane It's good that the bug is not in HEAD. I was testing on 9.1.4. Definitely the size of RSS is not just references to shared buffers because they are 1.8G and the backend RSS got to 4G. My setting for work_mem is 64M, so it's quite conservative - the server was tuned for max concurrency, not for max throughput per single query. Here is the plan of the insert: => explain INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), array_agg(value)) FROM old_prefs WHERE user_id<20 GROUP BY user_id; QUERY PLAN -- Insert on new_preferences (cost=65615.89..65617.73 rows=67 width=36) -> HashAggregate (cost=65615.89..65617.06 rows=67 width=68) -> Bitmap Heap Scan on old_prefs (cost=17645.25..56555.65 rows=1208032 width=68) Recheck Cond: (user_id < 20) -> Bitmap Index Scan on old_prefs_user_id_ids (cost=0.00..17343.24 rows=1208032 width=0) Index Cond: (user_id < 20) (6 rows) So, it is using hash aggregate as you have suggested. I have tried the query with disabled hash aggregate and it consumes a lot less memory - single query to migrate the whole table finishes with 900M RSS. After "ANALYZE old_prefs" the planner chooses GroupAggregate instead of HashAggregate - you were right about missing statistics of old_prefs. Thank you for figuring out this case Best regards -- luben karavelov -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs