[GENERAL] File Fragmentation
Hi, I have a PostgreSQL database with 50 tables. Every minute, sequentially, a batch load 10.000 rows of 250 bytes with a COPY. After a day, i got a database with 50 tables with 1.440 set of 10.000 rows. The tables are cleany and naturally clustered by the inserted timestamp. Each table has data in a file with 1.440 fragments (each day) Now, there is a partition rotation script, that suppress old tables when some size limit happens. Let suppose, that this script runs and suppress only one table qith few days of data, then recreates a new empty one. I got a disk freespace very fragmented, the space used by the rotated table. Then some COPY inserts new data, the tables got new data in theirs files and continue to be fragmented. The new tables begins to grows from the begining of the free space to and is more fragmented that ever. But all the data are always clustered in the tables. After few more rotated tables, all the tables are heavily fragmented and even if the data is clustered inside the file fragments are spread all over the drive. After few days, I see IO wait grows and grows, even when the size of the database stabilises due to the rotation script. I suspect the heavy fragmented files to the cause of the IO wait grows (PostgreSQL on WIndows). How to cope with that ? It seems I can not pregrow file in PostgreSQL. I found a trick: if i created an empty table and i insert dummy data, then i insert good data, then i suppress dummy data, then i vacuum the table (but not a full vacuum) i got a large file with freespace at the begining of the file. If all the files were created with that tricks and larger than the maximum data COPYed, i will have no fragmented files. JG -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] File Fragmentation
Hi, > It sounds like you are using partitioned tables. your partitions should be > divided up such that they help optimize your queries. that is, minimize the > number of partitions you need to scan for any given query. > > That said, try to make is so that this cleanup script purges whole > partitions, not just deleting some rows. That way new data will fill in > space without fragmentation. The rotated script, as explained, just drops tables and creates empty ones. There are only COPY and SELECT in this database. The problem seems that the IO pattern creates higly fragmented files. I have some files with 1,440 fragments a day. JG -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] File Fragmentation
Hi, > That doesn't make sense then, to have fragmentation if you are creating new > tables with fresh data copied into them. The files should be pretty much > sequentially written. > > O I see. You're using Windows. Maybe you need some OS with a better > file system that doesn't fragment like that? I know OS other than Windows will be better. But, I think on any OS, I would have some trouble because the pattern of IO. Each minute, each of the 50 tables has 10,000 new rows COPYed. Normaly the filesystem try to keep the file (under the tables) continous with few fragment. But the pattern is such, that it seems to me really difficult to prevent it. Do you have any idea to mitigate the problem on Windows ? JG -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] File Fragmentation
Hi, I create a test cas on Linux: postgres=# create table a (v int); postgres=# create table b (v int); Then a while(true) over the following script where 24577 and 24580 are the files of the tables a and b #!/bin/sh psql test -c 'insert into a select generate_series(1,10,1);' psql test -c 'insert into b select generate_series(1,10,1);' psql test -c 'checkpoint;' /usr/sbin/filefrag -v 24577 24580 ls -lh 24577 24580 After few minutes, I got 100 extend by files. The file fragmentation happens on Windows and Linux, too. I not sure that the Wait IO on Windows is related to file fragmentation. I try to find a way to analyse the situation. JG -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] File Fragmentation
Hi, Atfer 30 minutes, on my Linux computer, with 2 files fill one after the other. I got a fragmented files with many back step: # /usr/sbin/filefrag -v 24586 Filesystem type is: ef53 File size of 24586 is 822231040 (200740 blocks, blocksize 4096) ext logical physical expected length flags 0 0 22630402048 12048 2271232 2265087 2048 24096 2277376 2273279 2048 36144 2289664 2279423 2048 48192 2306048 2291711658 58850 2306707 2306705 1390 6 10240 2316288 2308096 2048 7 12288 2308097 2318335102 8 12390 2328576 2308198 1946 9 14336 2336768 2330521 2048 10 16384 2347008 2338815 4096 11 20480 2357248 2351103 2048 12 22528 2385920 2359295 4096 13 26624 2416640 2390015 2048 14 28672 2424832 2418687 4096 15 32768 2439168 2428927 2048 16 34816 2582528 2441215 2048 17 36864 2940928 2584575 2048 18 38912 3045376 2942975 2048 19 40960 1845248 3047423 2048 20 43008 1910784 1847295 2048 21 45056 2017280 1912831 2048 22 47104 2029568 2019327 2048 23 49152 2146304 2031615 2048 24 51200 2213888 2148351 2048 25 53248 3096576 2215935 2048 26 5529640960 3098623 2048 27 573449011243007 2048 28 59392 12492892159 2048 29 61440 102400 126975 2048 30 63488 161792 104447 2048 31 65536 164609 163839680 32 66216 243712 165288 1368 33 67584 307200 245079 2048 34 69632 372736 309247 2048 35 71680 448512 374783 2048 36 73728 495616 450559 2048 37 75776 577536 497663 2048 38 77824 649216 579583 2048 39 79872 724992 651263 2048 40 81920 757760 727039 2048 41 83968 849920 759807 2048 42 86016 909312 851967 2048 43 88064 929792 911359 2048 44 90112 972800 931839 2048 45 92160 968704 974847604 46 92764 1040384 969307 1444 47 94208 1081344 1041827 2048 48 96256 1134592 1083391 2048 49 98304 1171456 1136639 2048 50 100352 1165312 1173503 2048 51 102400 1202176 1167359 2048 52 104448 1234944 1204223 2048 53 106496 1267712 1236991 2048 54 108544 1298432 1269759 2048 55 110592 1325056 1300479 2048 56 112640 1372160 1327103 2048 57 114688 1384448 1374207 2048 58 116736 1433600 1386495 2048 59 118784 1452032 1435647 2048 60 120832 1499136 1454079 2048 61 122880 1529856 1501183 2048 62 124928 1560576 1531903 2048 63 126976 1687552 1562623 2048 64 129024 2125824 1689599 2048 65 131072 534560 2127871 2048 66 133120 544800 536607 2048 67 135168 1056800 546847 2048 68 137216 2629789 1058847 6144 69 143360 2867200 2635932 2048 70 145408 2887680 2869247 2048 71 14745675776 2889727 2048 72 149504 299008077823 2048 73 151552 3014656 2992127 2048 74 153600 3094528 3016703 2048 75 155648 3117056 3096575 2048 76 15769663488 3119103 2048 77 159744 19046465535 2048 78 161792 215040 192511 2048 79 163840 284672 217087 2048 80 165888 378880 286719 2048 81 167936 419840 380927 2048 82 169984 432128 421887 2048 83 172032 501760 434175 2048 84 174080 598016 503807 2048 85 176128 659456 600063 2048 86 178176 700416 661503 2048 87 180224 772096 702463 2048 88 182272 829440 774143 2048 89 184320 864256 831487 2048 90 186368 903168 866303 2048 91 188416 1030144 905215 2048 92 190464 1255424 1032191 2048 93 192512 1431552 1257471 2048 94 194560 1542144 1433599 2048 95 196608 1732608 1544191 2048 96 198656 1740800 1734655 2048 97 200704 1787904 1742847 36 eof 24586: 98 extents found You can see 3 back steps: position 26, 71, 76. Just imagine, 50 files over few days, the files become heavely fragmented with many back steps. So a sequential scan (logicaly from PostgreSQL point of view) is in fact random (filesystem point of view). JG Le Mercredi 20 Mars 2013 15:47 CET, j...@rilk.com a écrit: > Hi, > > I create a test cas on Linux: > postgres=# create table a (v int); > postgres=# create table b (v int); > > > Then a while(true) over the following script where 24577 and 24580 are the > files of the tables a and b > #!/bin/sh > psql test -c 'insert into a select generate_series(1,10,1);' > psql test -c 'insert into b select generate_series(1,10,1);' > psql test -c 'checkpoint;' > /usr/sbin/filefrag -v 24577 24580 > ls -lh 24577 24580 > > After few minutes, I got 100 extend by files. > > > The file fragmentation
[GENERAL] Where is diskchecker.pl ?
Hi, The PostgreSQK documentation refers to diskchecker.pl on the page http://brad.livejournal.com/2116715.html But on this page, the given link for diskchecker.pl does not exist anymore. After some unsuccessfull queries on Google to find the missing file, I wonder if one of you have a lin or a copy of this file. Thank you. -- Cordialement, Jean-Gérard Pailloncy -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Heavy Function Optimisation
Hi, In a projet, I have an heavy fonction that double the time of the query. I was surprised because the function was IMMUTABLE but no cache happens. So I wrote a small test. test.sql --- \timing on CREATE OR REPLACE FUNCTION dum(a int) RETURNS int LANGUAGE SQL STRICT IMMUTABLE AS $$ SELECT pg_sleep(1); SELECT 1000+$1; $$; SELECT dum(a) FROM ( SELECT 1::int AS a UNION ALL SELECT 2::int AS a UNION ALL SELECT 2::int AS a UNION ALL SELECT 3::int AS a UNION ALL SELECT 3::int AS a UNION ALL SELECT 3::int AS a ) t; WITH data AS ( SELECT 1::int AS a UNION ALL SELECT 2::int AS a UNION ALL SELECT 2::int AS a UNION ALL SELECT 3::int AS a UNION ALL SELECT 3::int AS a UNION ALL SELECT 3::int AS a) ,map AS (SELECT a, dum(a) FROM data GROUP BY a) SELECT m.dum FROM data AS d JOIN map AS m ON d.a=m.a; --- test=# \i test.sql Timing is on. CREATE FUNCTION Time: 1.479 ms dum -- 1001 1002 1002 1003 1003 1003 (6 rows) Time: 6084.172 ms a | dum ---+-- 1 | 1001 2 | 1002 2 | 1002 3 | 1003 3 | 1003 3 | 1003 (6 rows) Time: 3029.617 ms I was expecting the first query takes only 3 seconds, because I was (wrongly) thinking the results of the computation of the function computation was cached. So I emulate it with the WITH query to compute only one time by value the function dum. Do you think, this optimisation may be added to the optimizer ? -- Cordialement, Jean-Gérard Pailloncy -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Coalesce bug ?
Hi, In PostgreSQL 9.2, I have the following behavior, and I found it strange. ps3 is executed or "never executed" ? !!! JG [postgres@]test=# create or replace function ps3(a int) returns int as $$ BEGIN RAISE WARNING 'Call ps3(%)=%',$1,$1; RETURN $1::int; END; $$ LANGUAGE plpgsql STRICT IMMUTABLE; CREATE FUNCTION Temps : 22,632 ms [postgres@]test=# select coalesce( (select ps3(1)), (SELECT ps3(2)) ); WARNING: Call ps3(1)=1 WARNING: Call ps3(2)=2 coalesce -- 1 (1 ligne) Temps : 0,692 ms [postgres@]test=# select coalesce( ps3(1), ps3(2) ); WARNING: Call ps3(1)=1 coalesce -- 1 (1 ligne) Temps : 0,441 ms [postgres@]test=# explain (analyze, verbose, buffers) select coalesce( (select ps3(1)), (SELECT ps3(2)) ); WARNING: Call ps3(1)=1 WARNING: Call ps3(2)=2 QUERY PLAN Result (cost=0.02..0.03 rows=1 width=0) (actual time=0.006..0.006 rows=1 loops=1) Output: COALESCE($0, $1) InitPlan 1 (returns $0) -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1) Output: 1 InitPlan 2 (returns $1) -> Result (cost=0.00..0.01 rows=1 width=0) (never executed) Output: 2 Total runtime: 0.024 ms (9 lignes) Temps : 0,819 ms -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Coalesce bug ?
Hi, > In PostgreSQL 9.2, I have the following behavior, and I found it strange. Sorry the test was with 9.1.6 # psql -V psql (PostgreSQL) 9.1.6 JG -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Coalesce bug ?
Hi, Please test this script on a PostgreSQL 9.1.6, create or replace function ps3(a int) returns int as $$ BEGIN RAISE WARNING 'Call ps3(%)=%',$1,$1; RETURN $1::int; END; $$ LANGUAGE plpgsql STRICT IMMUTABLE; SELECT ps3(1); SELECT ps3(2); select coalesce( (select ps3(1)), (SELECT ps3(2)) ); explain (verbose, analyze, buffers) select coalesce( (select ps3(1)), (SELECT ps3(2)) ); select coalesce( ps3(1), ps3(2) ); The result will be [postgres@]test=# create or replace function ps3(a int) returns int as $$ BEGIN test$# RAISE WARNING 'Call ps3(%)=%',$1,$1; RETURN $1::int; test$# END; $$ LANGUAGE plpgsql STRICT IMMUTABLE; CREATE FUNCTION Temps : 13,232 ms [postgres@]test=# SELECT ps3(1); WARNING: Call ps3(1)=1 ps3 - 1 (1 ligne) Temps : 0,975 ms [postgres@]test=# SELECT ps3(2); WARNING: Call ps3(2)=2 ps3 - 2 (1 ligne) Temps : 0,473 ms [postgres@]test=# select coalesce( (select ps3(1)), (SELECT ps3(2)) ); WARNING: Call ps3(1)=1 WARNING: Call ps3(2)=2 coalesce -- 1 (1 ligne) Temps : 0,681 ms [postgres@]test=# explain (verbose, analyze, buffers) select coalesce( (select ps3(1)), (SELECT ps3(2)) ); WARNING: Call ps3(1)=1 WARNING: Call ps3(2)=2 QUERY PLAN Result (cost=0.02..0.03 rows=1 width=0) (actual time=0.004..0.004 rows=1 loops =1) Output: COALESCE($0, $1) InitPlan 1 (returns $0) -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.000 rows =1 loops=1) Output: 1 InitPlan 2 (returns $1) -> Result (cost=0.00..0.01 rows=1 width=0) (never executed) Output: 2 Total runtime: 0.022 ms (9 lignes) Temps : 0,774 ms [postgres@]test=# select coalesce( ps3(1), ps3(2) ); WARNING: Call ps3(1)=1 coalesce -- 1 (1 ligne) Temps : 0,562 ms [postgres@]test=# There is a bug ! Thank you for the documentation link, but it does not help me. JG -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Coalesce bug ?
Hi, Test done on PostgreSQL 9.2.1 pgb=# create or replace function ps3(a int) returns int as $$ BEGIN pgb$# RAISE WARNING 'Call ps3(%)=%',$1,$1; RETURN $1::int; pgb$# END; $$ LANGUAGE plpgsql STRICT IMMUTABLE; CREATE FUNCTION pgb=# SELECT ps3(1); WARNING: Call ps3(1)=1 ps3 - 1 (1 row) pgb=# SELECT ps3(2); WARNING: Call ps3(2)=2 ps3 - 2 (1 row) pgb=# select coalesce( (select ps3(1)), (SELECT ps3(2)) ); WARNING: Call ps3(1)=1 WARNING: Call ps3(2)=2 coalesce -- 1 (1 row) pgb=# explain (verbose, analyze, buffers) select coalesce( (select ps3(1)), (SELECT ps3(2)) ); WARNING: Call ps3(1)=1 WARNING: Call ps3(2)=2 QUERY PLAN Result (cost=0.02..0.03 rows=1 width=0) (actual time=0.014..0.015 rows=1 loops=1) Output: COALESCE($0, $1) InitPlan 1 (returns $0) -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.002..0.003 rows=1 loops=1) Output: 1 InitPlan 2 (returns $1) -> Result (cost=0.00..0.01 rows=1 width=0) (never executed) Output: 2 Total runtime: 0.079 ms (9 rows) pgb=# select coalesce( ps3(1), ps3(2) ); WARNING: Call ps3(1)=1 coalesce -- 1 (1 row) There is a bug too. JG -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Coalesce bug ?
Hi, Interesting idea. With VOLATILE, the bug disappears. With IMMUTABLE, the EXPLAIN and the execution does not match That is a bug. Even if the behavior has to be different in VOLATILE and IMMUTABLE, the EXPLAIN and the execution MUST be coherent. JG [postgres@]test=# create or replace function ps3(a int) returns int as $$ BEGIN test$# RAISE WARNING 'Call ps3(%)=%',$1,$1; RETURN $1::int; test$# END; $$ LANGUAGE plpgsql STRICT VOLATILE; CREATE FUNCTION Temps : 127,417 ms [postgres@]test=# SELECT ps3(1); WARNING: Call ps3(1)=1 ps3 - 1 (1 ligne) Temps : 0,941 ms [postgres@]test=# SELECT ps3(2); WARNING: Call ps3(2)=2 ps3 - 2 (1 ligne) Temps : 0,413 ms [postgres@]test=# select coalesce( (select ps3(1)), (SELECT ps3(2)) ); WARNING: Call ps3(1)=1 coalesce -- 1 (1 ligne) Temps : 0,501 ms [postgres@]test=# explain (verbose, analyze, buffers) select coalesce( (select ps3(1)), (SELECT ps3(2)) ); WARNING: Call ps3(1)=1 QUERY PLAN Result (cost=0.52..0.53 rows=1 width=0) (actual time=0.072..0.072 rows=1 loops =1) Output: COALESCE($0, $1) InitPlan 1 (returns $0) -> Result (cost=0.00..0.26 rows=1 width=0) (actual time=0.067..0.067 rows =1 loops=1) Output: ps3(1) InitPlan 2 (returns $1) -> Result (cost=0.00..0.26 rows=1 width=0) (never executed) Output: ps3(2) Total runtime: 0.095 ms (9 lignes) Temps : 0,630 ms [postgres@]test=# select coalesce( ps3(1), ps3(2) ); WARNING: Call ps3(1)=1 coalesce -- 1 (1 ligne) Temps : 0,451 ms [postgres@]test=# -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Heavy function optimisation ?
Hi, In a projet, I have an heavy fonction that double the time of the query. I was surprised because the function was IMMUTABLE but no cache happens. So I wrote a small test. test.sql --- \timing on CREATE OR REPLACE FUNCTION dum(a int) RETURNS int LANGUAGE SQL STRICT IMMUTABLE AS $$ SELECT pg_sleep(1); SELECT 1000+$1; $$; SELECT dum(a) FROM ( SELECT 1::int AS a UNION ALL SELECT 2::int AS a UNION ALL SELECT 2::int AS a UNION ALL SELECT 3::int AS a UNION ALL SELECT 3::int AS a UNION ALL SELECT 3::int AS a ) t; WITH data AS ( SELECT 1::int AS a UNION ALL SELECT 2::int AS a UNION ALL SELECT 2::int AS a UNION ALL SELECT 3::int AS a UNION ALL SELECT 3::int AS a UNION ALL SELECT 3::int AS a) ,map AS (SELECT a, dum(a) FROM data GROUP BY a) SELECT m.dum FROM data AS d JOIN map AS m ON d.a=m.a; --- test=# \i test.sql Timing is on. CREATE FUNCTION Time: 1.479 ms dum -- 1001 1002 1002 1003 1003 1003 (6 rows) Time: 6084.172 ms a | dum ---+-- 1 | 1001 2 | 1002 2 | 1002 3 | 1003 3 | 1003 3 | 1003 (6 rows) Time: 3029.617 ms I was expecting the first query takes only 3 seconds, because I was (wrongly) thinking the results of the computation of the function computation was cached. So I emulate it with the WITH query to compute only one time by value the function dum. Do you think, this optimisation may be added to the optimizer ? -- Cordialement, Jean-Gérard Pailloncy -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Coalesce bug ?
Hi, > Ah. The reason for that is a bit subtle: constant-folding of immutable > functions happens in the same pass over the query tree as simplification > of simplifiable constructs --- including COALESCE. So what's happening > is that eval_const_expressions, working on the COALESCE construct, first > calls itself recursively to simplify the first argument. That leads to > evaluation of ps3(1), and we get back a constant 1. Now we reach a > block of code with this comment: > > /* > * We can remove null constants from the list. For a > * non-null constant, if it has not been preceded by any > * other non-null-constant expressions then it is the > * result. Otherwise, it's the next argument, but we can > * drop following arguments since they will never be > * reached. > */ > > So at this point we realize that the result of the COALESCE() is 1, and > we don't bother to do const-simplification of its remaining arguments. > They're just thrown away, and the final command for execution is nothing > more than "SELECT 1" (as you can see if you do EXPLAIN VERBOSE). > > The other example with sub-SELECTs acts differently because the > sub-SELECT is something of an optimization fence --- "(SELECT 1)" does > not look like a simple Const to eval_const_expressions. > > As you noted upthread, none of this is a bug. Labeling a function > immutable is an explicit statement that it has no side-effects of > interest and can be evaluated whenever the system chooses. If you stick > in side-effects like a RAISE statement, then that lets you peer into > some inner workings of the optimizer, but it's you that's breaking the > rules not the optimizer. > > regards, tom lane Thank you for the explanation. It was tricky to get it, and I got this question as a side effect of some other optimisation works. -- Cordialement, Jean-Gérard Pailloncy -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Database denormalization
Hi I would like to ask weather PostgreSQL does database denormalization at runtime. That is, for example, if I have a normalized database and I use lots of querys that would run faster on a denormalized database, than will PostgreSQL create a denormalized version of the database for internal use. To specify further, the question is, can I count on PostgreSQL to denormalize the database when it would be better for the performance, or should I always denormalize the database and all the querys myself. I have looked for answers on the subject, but all I managed to find was a wiki article at http://en.wikipedia.org/wiki/Denormalization that says: "The preferred method is to keep the logical design normalised, but allow the database management system (DBMS) to store additional redundant information on disk to optimise query response. In this case it is the DBMS software's responsibility to ensure that any redundant copies are kept consistent. This method is often implemented in SQL as indexed views (Microsoft SQL Server) or materialised views (Oracle)." So in the case of PostgreSQL, do I also have to use views or are there some other ways? Thanks -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general