Btw, ps shows: postgres@darkblue:/data/pgdata/pg_xlog$ ps aux | grep post postgres 11496 0.1 0.9 161018232 3696076 ? S Jan29 2:49 postmaster -i -D /data/pgdata postgres 11499 0.0 1.6 161097088 6450616 ? Ss Jan29 1:39 postgres: checkpointer process postgres 11500 0.0 0.3 161095036 1414612 ? Ss Jan29 0:12 postgres: writer process postgres 11501 0.0 0.0 161095036 17328 ? Ss Jan29 0:05 postgres: wal writer process postgres 11502 0.0 0.0 161096724 3112 ? Ss Jan29 0:11 postgres: autovacuum launcher process postgres 11503 0.0 0.0 20136 884 ? Ss Jan29 0:10 postgres: archiver process last was 000000010000006E00000034 postgres 11504 0.0 0.0 20816 1412 ? Ss Jan29 0:53 postgres: stats collector process postgres 11507 0.0 0.0 161096264 2652 ? Ss Jan29 1:08 postgres: wal sender process postgres 192.168.30.65(45640) streaming 8E/5544E650 postgres 11864 0.0 0.0 161096136 2656 ? Ss Jan29 0:51 postgres: wal sender process postgres 192.168.10.95(37378) streaming 8E/5544E650 ...
The archiver process says "last was 000000010000006E00000034" and when i look into my wal-archive-directory i see: -rw------- 1 postgres postgres 16777216 Jan 31 10:24 000000010000006E00000033 -rw------- 1 postgres postgres 16777216 Jan 31 10:24 000000010000006E00000034 -rw------- 1 postgres postgres 16777216 Jan 29 16:03 000000010000008C0000008E -rw------- 1 postgres postgres 16777216 Jan 29 16:32 000000010000008C0000008F The 6E..34 file was just written by the archiver process. But further down at the same time this file was written: ... -rw------- 1 postgres postgres 16777216 Jan 31 10:24 000000010000008E00000054 which seems to match the position of the streaming wal-senders . Any ideas ? Thanks, Neil. 2013/1/31 Neil Worden <nworden1...@gmail.com> > >>> If your command does overwrite, then the server currently emitting the > >>> 8D files will become unrecoverable once those files start getting > >>> overwritten. If it refuses to overwrite, but returns a zero status, > >>> then the server currently emitting 6D would become unrecoverable once > >>> it reaches 8D and its "archived" files are not actually being archived > >>> but are getting deleted from the local pg_xlog anyway. > >> > >> > >> Would it not be easier to archive the different servers to different > >> directories and eliminate the possibility of name collision between > servers? > > >Easier? I would say that that is the only sane way of doing it. I > >was pointing out the consequences of messing it up. A proper > >archive_command will save you from some self-inflicted disasters, but > >that does not mean I'm recommending that you should invite those > >disasters on yourself. > > >If the original author is in a production environment, he desperately > >needs to figure out what is going on, especially so if archive_command > >is not tested and verified to obey its contract. > > >Cheers, > > >Jeff > > Thanks for your answers. > > Yes, we are in a production environment and there are definitely no two > masters writing to the wal-archive directory, at least none that i am aware > of. And i can assure you that we are not intentionally working on inviting > disasters either :-). > > Point is, the ..6D.. line started showing up exactly after i set up the > archive-command (which is btw: archive_command = 'test ! -f > /data/backup/walbackup/%f && cp %p /data/backup/walbackup/%f', running on > Ubuntu Linux Server) > > The situation is as follows: > > All concerned machines are running 9.2.2 64-bit on Ubuntu Linux Server > 12.10, installed from source, all following exactly the same procedure. We > have a hot-standby running to a different location over a rather thin line > running since version 9.1 came out. That worked > flawlessly, we only were bitten by autovacuums to prevent XID wraparounds > that generated relatively high wal-volume and we > were not sure whether the network connection could keep up with it before > deleting wal-files. Since we had to physically transfer a backup once for > other reasons, we set wal_keep_segments to 8192 to have enough > fallback-time. > > The total size of the database is currently at 313 GB and we are in the > process of rethinking our backup/emergency-strategy (currently daily full > dumps + hot-standby for read-only queries in the other location + one > hot-standby in the same office). > > So we set up another machine, intended to become a another hot_standby > later, but for now just use it to experiment with pg_receivexlog. Since our > current backup-strategy does not allow for PIT-recovery and we have been > thinking about wal-archiving all the time, we decided to implement it and > thats what we did. Now we have the two "lines" in the archive-directory. > > Could the the high number of wal_keep_segments have an impact ? > Does the fact that there already were a lot of existing wal-files when i > set up archiving and the archive-command have an impact ? > > Jeff, you wrote: > > >> And how would i restore the needed file names for recovery > >> if i decide to keep one base-backup und then a very long chain of > wal-files > >> ? > > >There should be no need for that. > > When you said there would be no need for that, did you mean restoring the > files for recovery or keeping a base-backup and the chain of wal-files ? > > I understand that the archive-command is responsible for not overwriting > wal-files. But if that situation occurs, and if i understand you correctly > it will, what do i do ? > If the wal-archive files will be overwritten at some point in time, how is > it possible to only have one single base-backup at time-point t and from > then on only store the following wal-files ( however inconvenient that may > be ) to be able to restore to any point in time after time t ? > > Thanks, > > Neil > > > >