Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-16 Thread Greg Stark
On Sat, Feb 15, 2014 at 11:45 AM, Andres Freund wrote: > I guess the theoretically correct thing would be to make all WAL records > about truncation and unlinking contain the current size of the relation, > but especially with deletions and forks that will probably turn out to > be annoying to do.

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-15 Thread Andres Freund
On 2014-02-14 22:30:45 -0500, Tom Lane wrote: > Andres Freund writes: > > On 2014-02-14 20:46:01 +, Greg Stark wrote: > >> Going over this I think this is still a potential issue: > >> On 31 Jan 2014 15:56, "Andres Freund" wrote: > >>> I am not sure that explains the issue, but I think the re

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-14 Thread Tom Lane
Andres Freund writes: > On 2014-02-14 20:46:01 +, Greg Stark wrote: >> Going over this I think this is still a potential issue: >> On 31 Jan 2014 15:56, "Andres Freund" wrote: >>> I am not sure that explains the issue, but I think the redo action for >>> truncation is not safe across crashes.

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-14 Thread Andres Freund
On 2014-02-14 20:46:01 +, Greg Stark wrote: > Going over this I think this is still a potential issue: > > On 31 Jan 2014 15:56, "Andres Freund" wrote: > > > > > I am not sure that explains the issue, but I think the redo action for > > truncation is not safe across crashes. A XLOG_SMGR_TRU

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-14 Thread Greg Stark
Going over this I think this is still a potential issue: On 31 Jan 2014 15:56, "Andres Freund" wrote: > > I am not sure that explains the issue, but I think the redo action for > truncation is not safe across crashes. A XLOG_SMGR_TRUNCATE will just > do a smgrtruncate() (and then mdtruncate) wh

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-13 Thread Tom Lane
Greg Stark writes: > On Thu, Feb 13, 2014 at 7:52 PM, Tom Lane wrote: >> That's what's bothering me, too. On the other hand, if we can't think of >> a scenario where it'd be necessary to replay the high-offset update, then >> I'm disinclined to mess with the code further. > And the whole point

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-13 Thread Greg Stark
On Thu, Feb 13, 2014 at 7:52 PM, Tom Lane wrote: >> The scenario I could come up with that didn't require a broken base backup >> was that there was an earlier truncate or vacuum. So the sequence is high >> offset reference, truncate, growth, crash. All possibly on a single >> database. > > That's

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-13 Thread Tom Lane
Greg Stark writes: >> I think what you're arguing is that we should see WAL records filling the >> rest of segment 1 before we see any references to segment 2, but if that's >> the case then how did we get into the situation you reported? Or is it >> just that it was a broken base backup to start

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-13 Thread Greg Stark
> I think what you're arguing is that we should see WAL records filling the > rest of segment 1 before we see any references to segment 2, but if that's > the case then how did we get into the situation you reported? Or is it > just that it was a broken base backup to start with? The scenario I c

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-13 Thread Andrea Suisani
Hi all, On 02/12/2014 08:27 PM, Greg Stark wrote: On Wed, Feb 12, 2014 at 6:55 PM, Tom Lane wrote: Greg Stark writes: For what it's worth I've confirmed the bug in wal-e caused the initial problem. Huh? Bug in wal-e? What bug? WAL-E actually didn't restore a whole 1GB file due to a tra

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-12 Thread Tom Lane
Greg Stark writes: > On Wed, Feb 12, 2014 at 8:28 PM, Tom Lane wrote: >> Oh, wait a minute. It's not just a matter of whether we find the right >> block: we also have to consider whether XLogReadBufferExtended will >> apply the right "mode" behavior. Currently, it supposes that all pages >> pas

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-12 Thread Greg Stark
On Wed, Feb 12, 2014 at 8:28 PM, Tom Lane wrote: > Oh, wait a minute. It's not just a matter of whether we find the right > block: we also have to consider whether XLogReadBufferExtended will > apply the right "mode" behavior. Currently, it supposes that all pages > past the initially observed E

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-12 Thread Tom Lane
I wrote: > What I think we probably want to do is forcibly cause the target page > to exist, using a P_NEW loop like what I committed, and then decide > on the basis of whether it's all-zeroes whether to consider it invalid > or not. This seems sane on the grounds that it's just the extension > to

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-12 Thread Tom Lane
I wrote: > Greg Stark writes: >> WAL-E actually didn't restore a whole 1GB file due to a transient S3 >> problem, in fact a bunch of them. > Hah. Okay, I think we can write this issue off as closed then. Oh, wait a minute. It's not just a matter of whether we find the right block: we also have

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-12 Thread Tom Lane
Greg Stark writes: > On Wed, Feb 12, 2014 at 6:55 PM, Tom Lane wrote: >> Greg Stark writes: >>> This does possibly allocate an extra block past the target block. I'm >>> not sure how surprising that would be for the rest of the code. >> Should be fine; we could end up with an extra block after

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-12 Thread Greg Stark
On Wed, Feb 12, 2014 at 6:55 PM, Tom Lane wrote: > Greg Stark writes: >> On Wed, Feb 12, 2014 at 5:29 PM, Tom Lane wrote: >>> How about the attached instead? > >> This does possibly allocate an extra block past the target block. I'm >> not sure how surprising that would be for the rest of the co

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-12 Thread Tom Lane
Greg Stark writes: > On Wed, Feb 12, 2014 at 5:29 PM, Tom Lane wrote: >> How about the attached instead? > This does possibly allocate an extra block past the target block. I'm > not sure how surprising that would be for the rest of the code. Should be fine; we could end up with an extra block

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-12 Thread Tom Lane
I wrote: > Greg Stark writes: >> (Or maybe the hot backup >> process could just catch the files in this state if a table is rapidly >> growing and it doesn't take care to avoid picking up new files that >> appear after it starts?) > That's a possible explanation I guess, but it doesn't seem terri

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-12 Thread Greg Stark
On Wed, Feb 12, 2014 at 5:29 PM, Tom Lane wrote: > How about the attached instead? This does possibly allocate an extra block past the target block. I'm not sure how surprising that would be for the rest of the code. For what it's worth I've confirmed the bug in wal-e caused the initial problem.

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-12 Thread Tom Lane
Greg Stark writes: > So I think I've come up with a scenario that could cause this. I don't > think it's exactly what happened here but maybe something analogous > happened with our base backup restore. I agree it seems like a good idea for XLogReadBufferExtended to defend itself against successi

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-12 Thread Tom Lane
Greg Stark writes: > So here's my attempt to rewrite this logic. I ended up refactoring a > bit because I found it unnecessarily confusing having the mode > branches in several places. I think it's much clearer just having two > separate pieces of logic for RBM_NEW and the extension cases since al

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-12 Thread Greg Stark
So here's my attempt to rewrite this logic. I ended up refactoring a bit because I found it unnecessarily confusing having the mode branches in several places. I think it's much clearer just having two separate pieces of logic for RBM_NEW and the extension cases since all they have in common is the

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-12 Thread Greg Stark
So I think I've come up with a scenario that could cause this. I don't think it's exactly what happened here but maybe something analogous happened with our base backup restore. On the primary you extend a table a bunch, including adding new segments, but crash before committing (or checkpointing)

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-11 Thread Greg Stark
On Sun, Feb 9, 2014 at 2:54 PM, Greg Stark wrote: > Bad block's page header -- this is in the 56'th relation segment: > > =# select > (page_header(E'\\x2005583b05aa050028001805002004201098e00f2090e00f088d24061885e00f')).*; > lsn | tli | flags | lower | u

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-09 Thread Greg Stark
On Thu, Feb 6, 2014 at 11:41 PM, Greg Stark wrote: > > That doesn't explain the other instance or the other copies of this > database. I think the most productive thing I can do is switch my > attention to the other database to see if it really looks like the > same problem. So here's an instance

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-07 Thread Andres Freund
On 2014-02-06 20:06:03 -0500, Tom Lane wrote: > Andres Freund writes: > > That reminds me, not that I directly see how it could be responsible, > > there's still 20131029011623.gj20...@awork2.anarazel.de ff. around. I > > don't think we came to a agreement in that thread how to fix the > > problem

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-06 Thread Tom Lane
Andres Freund writes: > That reminds me, not that I directly see how it could be responsible, > there's still 20131029011623.gj20...@awork2.anarazel.de ff. around. I > don't think we came to a agreement in that thread how to fix the > problem. Hm, yeah. I'm not sure I believe Heikki's argument t

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-06 Thread Andres Freund
On 2014-02-06 18:42:05 -0500, Tom Lane wrote: > Greg Stark writes: > > On Thu, Feb 6, 2014 at 11:48 PM, Andres Freund > > wrote: > >> That's not necessarily true. If e.g. the buffer mapping would change > >> racily, the result write from the bgwriter could very well end up > >> increasing the fi

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-06 Thread Tom Lane
Greg Stark writes: > On Thu, Feb 6, 2014 at 11:48 PM, Andres Freund wrote: >> That's not necessarily true. If e.g. the buffer mapping would change >> racily, the result write from the bgwriter could very well end up >> increasing the file size, leaving a hole inbetween its write and the >> origin

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-06 Thread Greg Stark
On Thu, Feb 6, 2014 at 11:48 PM, Andres Freund wrote: > > That's not necessarily true. If e.g. the buffer mapping would change > racily, the result write from the bgwriter could very well end up > increasing the file size, leaving a hole inbetween its write and the > original size. a) the segment

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-06 Thread Andres Freund
On 2014-02-06 23:41:19 +0100, Greg Stark wrote: > The problem with the bgwriter being at fault is that from what I can > see the bgwriter will never extend a file. That means the xlog > recovery code must have done it. That means even if the bgwriter came > along and looked at the buffer we just re

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-06 Thread Greg Stark
On Thu, Feb 6, 2014 at 10:48 PM, Tom Lane wrote: > I had noticed that the WAL records that were mis-replayed seemed to > be bunched pretty close together (two of them even adjacent). Could > you confirm that? If so, it seems like we're looking for some condition > that makes mis-replay fairly pr

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-06 Thread Tom Lane
Greg Stark writes: > Both the primary and the standby were 9.1.11 from the get-go. The > database the primary was forked off of was 9.1.10 but as far as I can > tell the primary in the current pair has no problems. > What's worse is we created a new standby from the same base backup and > replaye

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-06 Thread Greg Stark
On Mon, Feb 3, 2014 at 12:02 AM, Tom Lane wrote: > What version were you running before 9.1.11 exactly? I took a look > through all the diffs from 9.1.9 up to 9.1.11, and couldn't find any > changes that seemed even vaguely related to this. There are some > changes in known-transaction tracking,

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-02 Thread Tom Lane
Greg Stark writes: > On Sun, Feb 2, 2014 at 6:03 PM, Tom Lane wrote: >> Can we see the associated WAL records (ie, the ones matching the LSNs >> in the last blocks of these files)? > Sorry, I've lost track of what information I already shared or didn't, Hm. So one of these is a heap update, no

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-02 Thread Greg Stark
On Sun, Feb 2, 2014 at 6:03 PM, Tom Lane wrote: > Greg Stark writes: >> The relfilenodes that have nul blocks before the last block are: > > Can we see the associated WAL records (ie, the ones matching the LSNs > in the last blocks of these files)? Sorry, I've lost track of what information I al

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-02 Thread Tom Lane
Greg Stark writes: > The relfilenodes that have nul blocks before the last block are: Can we see the associated WAL records (ie, the ones matching the LSNs in the last blocks of these files)? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postg

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-02 Thread Greg Stark
Hm, I'm not entirely convinced those are erroneous replays to wrong blocks. They don't look right but there are no blocks of NULs preceding them. So if they're wrong then they only extended the relations by a single block. The relfilenodes that have nul blocks before the last block are: relfilen

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-02 Thread Greg Stark
I've poked at this a bit more. There are at least 10 relations where the last block doesn't match the block mentioned in the xlog record that its LSN indicates. At least it looks like from the info xlogdump prints. Including two blocks where the "correct" block has the same LSN which maybe means t

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-01 Thread Greg Stark
The plot thickens... Looking at the next relation I see a similar symptom of a single valid block at the end of several segments of nuls. This relation is also a btree on the same table and has a header in the near vicinity of the xlog: d9de7pcqls4ib6=# select (page_header(get_raw_page('event_dat

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-01 Thread Greg Stark
On Fri, Jan 31, 2014 at 8:21 PM, Tom Lane wrote: > So on a filesystem that supports "holes" > in files, I'd expect that the added segments would be fully allocated > if XLogReadBufferExtended did the deed, but they'd be quite small if > _mdfd_getseg did so. The du results you started with sugges

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-31 Thread Tom Lane
Josh Berkus writes: > FWIW, we've periodically seen reports from our clients of replica > databases being slightly larger than the master. Nothing reproducable > or as severe as Greg's issue, or we'd have reported it. But this could > be a more widespread issue, just that it affects most users i

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-31 Thread Greg Stark
On Fri, Jan 31, 2014 at 10:11 PM, Tom Lane wrote: > Yeah, I'd been wondering if the WAL record somehow got corrupted while > in memory (presumably after being CRC-checked). It's a bit hard to see > how though. One thing I mentioned early on but bears repeating is that this instance is 9.1.11. A

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-31 Thread Josh Berkus
On 01/31/2014 01:11 PM, Tom Lane wrote: > Greg Stark writes: >> One thing I keep coming back to is a bad ran chip setting a bit in the >> block number. But I just can't seem to get it to add up. The difference is >> not a power of two, it had happened on two different machines, and we don't >> see

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-31 Thread Tom Lane
Greg Stark writes: > One thing I keep coming back to is a bad ran chip setting a bit in the > block number. But I just can't seem to get it to add up. The difference is > not a power of two, it had happened on two different machines, and we don't > see other weirdness on the machine. It seems like

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-31 Thread Greg Stark
One thing I keep coming back to is a bad ran chip setting a bit in the block number. But I just can't seem to get it to add up. The difference is not a power of two, it had happened on two different machines, and we don't see other weirdness on the machine. It seems like a strange coincidence it wo

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-31 Thread Tom Lane
Greg Stark writes: > So just to summarize, this xlog record: > [cur:EA1/637140, xid:1418089147, rmid:11(Btree), len/tot_len:18/6194, > info:8, prev:EA1/635290] insert_leaf: s/d/r:1663/16385/1261982 tid > 3634978/282 > [cur:EA1/637140, xid:1418089147, rmid:11(Btree), len/tot_len:18/6194, > info:8,

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-31 Thread Greg Stark
So just to summarize, this xlog record: [cur:EA1/637140, xid:1418089147, rmid:11(Btree), len/tot_len:18/6194, info:8, prev:EA1/635290] insert_leaf: s/d/r:1663/16385/1261982 tid 3634978/282 [cur:EA1/637140, xid:1418089147, rmid:11(Btree), len/tot_len:18/6194, info:8, prev:EA1/635290] bkpblock[1]: s

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-31 Thread Greg Stark
On Fri, Jan 31, 2014 at 3:41 PM, Tom Lane wrote: >> 400 * 400 * 400 / 2000 * 54 + 1F0C / 2000 >> 11073632 Ooops, it's reading 54 in hex there. > # select ((2^30) * 54.0 + 'x1F0C'::bit(32)::int) / 8192; > ?column? > -- > 7141472 ibase=16 400 * 400 * 400 / 2000 * 36 + 1F0C /

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-31 Thread Andres Freund
On 2014-01-31 10:33:16 -0500, Tom Lane wrote: > Andres Freund writes: > > It's interesting that the smgr gets this wrong then (as also evidenced > > by the fact that relation_size does as well). Could you please do a ls > > -l path/to/relfilenode*? > > IIRC, smgrnblocks will stop as soon as it fi

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-31 Thread Tom Lane
Greg Stark writes: > Sorry guys. I transposed two numbers when looking up the relation. > "data_pk" wasn't the right index. > =# select (page_header(get_raw_page('index_data_id', 'main', 3020854))).* ; > lsn | tli | flags | lower | upper | special | pagesize | > version | prune_xid > --

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-31 Thread Greg Stark
Sorry guys. I transposed two numbers when looking up the relation. "data_pk" wasn't the right index. =# select (page_header(get_raw_page('index_data_id', 'main', 3020854))).* ; lsn | tli | flags | lower | upper | special | pagesize | version | prune_xid --+-+---+-

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-31 Thread Tom Lane
Greg Stark writes: > On Fri, Jan 31, 2014 at 3:19 PM, Andres Freund wrote: >> Isn't the page 3634978? > The page in the record is. > But the page on disk is in the 54th segment at offset 1F0C > So unless my arithmetic is wrong: > bc -l > ibase=16 > 400 * 400 * 400 / 2000 * 54 + 1F0C /

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-31 Thread Andres Freund
On 2014-01-31 10:33:16 -0500, Tom Lane wrote: > Andres Freund writes: > > It's interesting that the smgr gets this wrong then (as also evidenced > > by the fact that relation_size does as well). Could you please do a ls > > -l path/to/relfilenode*? > > IIRC, smgrnblocks will stop as soon as it fi

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-31 Thread Tom Lane
Andres Freund writes: > It's interesting that the smgr gets this wrong then (as also evidenced > by the fact that relation_size does as well). Could you please do a ls > -l path/to/relfilenode*? IIRC, smgrnblocks will stop as soon as it finds a segment that is not 1GB in size. Could you check th

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-31 Thread Andres Freund
On 2014-01-31 15:21:35 +, Greg Stark wrote: > On Fri, Jan 31, 2014 at 3:19 PM, Andres Freund wrote: > >> =# select get_raw_page('data_pkey', 'main', 11073632) ; > >> ERROR: block number 11073632 is out of range for relation "data_pkey" > > > > Isn't the page 3634978? > > The page in the reco

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-31 Thread Greg Stark
On Fri, Jan 31, 2014 at 3:19 PM, Andres Freund wrote: >> =# select get_raw_page('data_pkey', 'main', 11073632) ; >> ERROR: block number 11073632 is out of range for relation "data_pkey" > > Isn't the page 3634978? The page in the record is. But the page on disk is in the 54th segment at offset

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-31 Thread Andres Freund
On 2014-01-31 15:15:24 +, Greg Stark wrote: > On Fri, Jan 31, 2014 at 3:08 PM, Andres Freund wrote: > > > It points to the end of the record (i.e. the beginning of the next). It > > needs to, because otherwise XLogFlush()es on the pd_lsn wouldn't flush > > enough. > > Ah, in which case the r

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-31 Thread Greg Stark
On Fri, Jan 31, 2014 at 3:08 PM, Andres Freund wrote: > It points to the end of the record (i.e. the beginning of the next). It > needs to, because otherwise XLogFlush()es on the pd_lsn wouldn't flush > enough. Ah, in which case the relevant record is: [cur:EA1/637140, xid:1418089147, rmid:11(Bt

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-31 Thread Andres Freund
On 2014-01-31 14:59:21 +, Greg Stark wrote: > On Fri, Jan 31, 2014 at 2:39 PM, Greg Stark wrote: > > [cur:EA1/637140, xid:1418089147, rmid:11(Btree), len/tot_len:18/6194, > > info:8, prev:EA1/635290] bkpblock[1]: s/d/r:1663/16385/1261982 > > blk:3634978 hole_off/len:1240/2072 > > [cur:EA1/6389

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-31 Thread Greg Stark
On Fri, Jan 31, 2014 at 2:39 PM, Greg Stark wrote: > [cur:EA1/637140, xid:1418089147, rmid:11(Btree), len/tot_len:18/6194, > info:8, prev:EA1/635290] bkpblock[1]: s/d/r:1663/16385/1261982 > blk:3634978 hole_off/len:1240/2072 > [cur:EA1/638988, xid:1418089147, rmid:11(Btree), len/tot_len:18/5894, >

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-31 Thread Andres Freund
On 2014-01-31 14:39:47 +, Greg Stark wrote: > 1261982.53 is entirely nuls. I think that's true for most if not all > of the intervening files, still investigating. > > The 54th segment is nul up to offset 1f0c after which it has valid > looking blocks: It'd be interesting to dump the page

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-31 Thread Greg Stark
1261982.53 is entirely nuls. I think that's true for most if not all of the intervening files, still investigating. The 54th segment is nul up to offset 1f0c after which it has valid looking blocks: # hexdump 1261982.54 | head -100 000 * 1f0c 0e

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-31 Thread Andres Freund
On 2014-01-31 11:46:09 +, Greg Stark wrote: > On Fri, Jan 31, 2014 at 11:26 AM, Andres Freund > wrote: > > The slightly more likely explanation for transient errors is that you > > hit the vacuum bug (061b079f89800929a863a692b952207cadf15886). That had > > only taken effect if HS has already

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-31 Thread Greg Stark
On Fri, Jan 31, 2014 at 11:26 AM, Andres Freund wrote: > The slightly more likely explanation for transient errors is that you > hit the vacuum bug (061b079f89800929a863a692b952207cadf15886). That had > only taken effect if HS has already assembled a snapshot, which can make > such an error vanish

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-31 Thread Andres Freund
On 2014-01-31 11:09:14 +, Greg Stark wrote: > On Sun, Jan 26, 2014 at 5:45 PM, Andres Freund wrote: > > > >> We're also seeing log entries about "wal contains reference to invalid > >> pages" but these errors seem only vaguely correlated. Sometimes we get > >> the errors but the tables don't g

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-31 Thread Andres Freund
On 2014-01-31 11:09:14 +, Greg Stark wrote: > On Sun, Jan 26, 2014 at 5:45 PM, Andres Freund wrote: > > > >> We're also seeing log entries about "wal contains reference to invalid > >> pages" but these errors seem only vaguely correlated. Sometimes we get > >> the errors but the tables don't g

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-31 Thread Greg Stark
On Sun, Jan 26, 2014 at 5:45 PM, Andres Freund wrote: > >> We're also seeing log entries about "wal contains reference to invalid >> pages" but these errors seem only vaguely correlated. Sometimes we get >> the errors but the tables don't grow noticeably and sometimes we don't >> get the errors an

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-26 Thread Greg Stark
On Sun, Jan 26, 2014 at 9:45 AM, Andres Freund wrote: > Hi, > > On 2014-01-24 19:23:28 -0500, Greg Stark wrote: >> Since the point release we've run into a number of databases that when >> we restore from a base backup end up being larger than the primary >> database was. Sometimes by a large fact

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-26 Thread Andres Freund
Hi, On 2014-01-24 19:23:28 -0500, Greg Stark wrote: > Since the point release we've run into a number of databases that when > we restore from a base backup end up being larger than the primary > database was. Sometimes by a large factor. The data below is from > 9.1.11 (both primary and standby)