adding pgsql-bugs list in case OP posts back. On Fri, Sep 7, 2012 at 11:29 AM, Pavan Deolasee <pavan.deola...@gmail.com> wrote: > (Adding -hackers. Did not realize it got dropped) > > On Fri, Sep 7, 2012 at 11:25 PM, Gezeala M. Bacuño II <geze...@gmail.com> > wrote: >> >> On Fri, Sep 7, 2012 at 7:17 AM, Pavan Deolasee <pavan.deola...@gmail.com> >> wrote: >> > >> > >> > On Fri, Sep 7, 2012 at 7:00 PM, Marie Bacuno II <geze...@gmail.com> >> > wrote: >> >> >> >> >> >> On Sep 7, 2012, at 2:19, Pavan Deolasee <pavan.deola...@gmail.com> >> >> wrote: >> >> >> >> >> >> > or have long running transactions ? >> >> >> >> Yes but I don't think there are when the snapshot was taken. Does the >> >> pg_xlog_location_diff() result from latest and prior checkpoint upon >> >> start-up indicates the size of replayed changes? >> >> >> > >> > Thats the amount of additional WAL generated after you started the >> > server. >> > >> >> >> >> > >> >> > BTW, the following query returns ~60GB. Thats the amount of WAL >> >> > written after the server was started and at the end of pg_dump (I >> >> > don't think pg_xlog_location_diff() is available in the older >> >> > releases). >> >> > >> >> > postgres=# select pg_xlog_location_diff('4450/7A14F280', >> >> > '4441/5E681F38')/(2^30); >> >> > ?column? >> >> > ------------------ >> >> > 60.1980484202504 >> >> >> >> It'll be great to know what the wals modified..? >> > >> > >> > You would need something like xlogdump to decipher them. I quickly tried >> > this and it seems to work against 8.4 version that you are running. >> > https://github.com/snaga/xlogdump >> > >> > Download the source code, compile and run it against one of the most >> > recent >> > WAL files in the cluster against which you ran pg_dump. You would need >> > to >> > set PATH to contain the pg_config of the server you are running. Please >> > post >> > the output. >> > >> > Thanks, >> > Pavan >> > >> > >> >> Here you go: >> >> ## last WAL >> $ xlogdump -S /dbpool/data/pg_xlog/00000001000044500000007A >> >> /dbpool/data/pg_xlog/00000001000044500000007A: >> >> Unexpected page info flags 0003 at offset 0 >> Skipping unexpected continuation record at offset 0 >> ReadRecord: record with zero len at 17488/7A14F310 >> Unexpected page info flags 0001 at offset 150000 >> Skipping unexpected continuation record at offset 150000 >> Unable to read continuation page? >> ** maybe continues to next segment ** >> --------------------------------------------------------------- >> TimeLineId: 1, LogId: 17488, LogSegment: 122 >> >> Resource manager stats: >> [0]XLOG : 3 records, 120 bytes (avg 40.0 bytes) >> checkpoint: 3, switch: 0, backup end: 0 >> [1]Transaction: 0 record, 0 byte (avg 0.0 byte) >> commit: 0, abort: 0 >> [2]Storage : 0 record, 0 byte (avg 0.0 byte) >> [3]CLOG : 0 record, 0 byte (avg 0.0 byte) >> [4]Database : 0 record, 0 byte (avg 0.0 byte) >> [5]Tablespace: 0 record, 0 byte (avg 0.0 byte) >> [6]MultiXact : 0 record, 0 byte (avg 0.0 byte) >> [7]Reserved 7: 0 record, 0 byte (avg 0.0 byte) >> [8]Reserved 8: 0 record, 0 byte (avg 0.0 byte) >> [9]Heap2 : 2169 records, 43380 bytes (avg 20.0 bytes) >> [10]Heap : 0 record, 0 byte (avg 0.0 byte) >> ins: 0, upd/hot_upd: 0/0, del: 0 >> [11]Btree : 0 record, 0 byte (avg 0.0 byte) >> [12]Hash : 0 record, 0 byte (avg 0.0 byte) >> [13]Gin : 0 record, 0 byte (avg 0.0 byte) >> [14]Gist : 0 record, 0 byte (avg 0.0 byte) >> [15]Sequence : 0 record, 0 byte (avg 0.0 byte) >> >> Backup block stats: 2169 blocks, 16551816 bytes (avg 7631.1 bytes) >> > > I think both my theories seem to be holding up. Heap2 resource manager is > used only for vacuum freeze, lazy vacuum or HOT prune. Given your access > pattern, I bet its the third activity that kicking in on your database. You > got many pages with dead tuples and they are getting cleaned at the first > opportunity, which happens to be the pg_dump thats run immediately after the > server restart. This is seen by all 2169 WAL records in the file being > attributed to the Heap2 RM above. > > Whats additionally happening is each of these records are on different heap > pages. The cleanup activity dirties those pages. Since each of these pages > is being dirtied for the first time after a recent checkpoint and > full_page_writes is turned ON, entire page is backed up in the WAL record. > You can see the exact number of backup blocks in the stats above. > > I don't think we have any mechanism to control or stop HOT from doing what > it wants to do, unless you are willing to run a modified server for this > reason. But you can at least bring down the WAL volume by turning > full_page_writes OFF. > > Thanks, > Pavan
Great. Finally got some light on this. I'll disable full_page_writes on my next backup and will post back results tomorrow. Thanks. -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs