Re: Permission denied on fsync / Win32 (was [BUGS] right sibling is not next child)

2006-04-13 Thread Tom Lane
"Magnus Hagander" <[EMAIL PROTECTED]> writes: > BTW, win32 sometimes has a bad habit of returning access denied for > other things as well - in some caes you can get access denied instead of > sharing violation, and you can often get it from AV and firewalls and > such. Looking at the fsync code i

Re: Permission denied on fsync / Win32 (was [BUGS] right sibling is not next child)

2006-04-13 Thread Magnus Hagander
> > - The file system is NTFS > > OK, anyone know anything about permissions on NTFS? Yes. What do you need to know ;-) BTW, win32 sometimes has a bad habit of returning access denied for other things as well - in some caes you can get access denied instead of sharing violation, and you can of

Re: Permission denied on fsync / Win32 (was [BUGS] right sibling is not next child)

2006-04-13 Thread Tom Lane
"Peter Brant" <[EMAIL PROTECTED]> writes: > It turns out we've been getting rather huge numbers of "Permission > denied" errors relating to fsync so perhaps it wasn't really a precursor > to the crash as I'd previously thought. > I've pasted in a complete list following this email covering the t

Re: [BUGS] right sibling is not next child

2006-04-13 Thread Peter Brant
Sounds good. There is nothing sensitive in DbTranImageStatus_pkey so if you decide you want it after all, it's there for the asking. Pete >>> Tom Lane <[EMAIL PROTECTED]> 04/13/06 3:30 am >>> Oh, never mind ... I've sussed it. ---(end of broadcast)--

Permission denied on fsync / Win32 (was [BUGS] right sibling is not next child)

2006-04-13 Thread Peter Brant
It turns out we've been getting rather huge numbers of "Permission denied" errors relating to fsync so perhaps it wasn't really a precursor to the crash as I'd previously thought. I've pasted in a complete list following this email covering the time span from 3/20 to 4/6. The number in the firs

Re: [BUGS] right sibling is not next child

2006-04-12 Thread Tom Lane
I wrote: > Does that index contain any sensitive data, and if not could I trouble > you for a copy? I'm still not clear on the mechanism by which the > indexes got corrupted like this. Oh, never mind ... I've sussed it. nbtxlog.c's forget_matching_split() assumes it can look into the page that w

Re: [BUGS] right sibling is not next child

2006-04-12 Thread Tom Lane
"Peter Brant" <[EMAIL PROTECTED]> writes: > (a bunch of these) > [2006-03-31 13:00:01.720 ] 2636 LOG: could not fsync segment 0 of > relation 1663/16385/1392439: Permission denied > [2006-03-31 13:00:01.720 ] 2636 ERROR: storage sync failed on magnetic > disk: Permission denied Yoi. I think we

Re: [BUGS] right sibling is not next child

2006-04-12 Thread Peter Brant
The middle tier transaction log indicates this record was inserted into the county database at 2006-03-31 21:00:32.94. It would have hit the central databases sometime thereafter (more or less immediately if all was well). The Panel table contains some running statistics which are updated frequen

Re: [BUGS] right sibling is not next child

2006-04-12 Thread Tom Lane
"Peter Brant" <[EMAIL PROTECTED]> writes: > One thing that seems strange to me is that the original crash on > Thursday failed on Panel_pkey, but my "vacuum analyze verbose" on a copy > of the crashed database failed on MaintCode / > pg_statistic_relid_att_index. I can't find anything particularly

Re: [BUGS] right sibling is not next child

2006-04-12 Thread Peter Brant
Per the DBAs, there hadn't been any recent crashes before last Thursday. A "vacuum analyze verbose" discovered the problem early Thursday morning. After the PANIC, the database never came back up (the heap_clean_redo: no block / full_page_writes = off problem). One thing that seems strange to me

Re: [BUGS] right sibling is not next child

2006-04-12 Thread Tom Lane
"Peter Brant" <[EMAIL PROTECTED]> writes: > I can't find any duplicates?!? Weirder and weirder. Maybe the table is OK but the index is corrupt? Could it be another symptom of the same problem we're seeing in the Panel_pkey index? I'm currently theorizing that that index might've been corrupted

Re: [BUGS] right sibling is not next child

2006-04-12 Thread Peter Brant
I can't find any duplicates?!? The query select starelid, staattnum, ctid, xmin, xmax, cmin, cmax from pg_statistic p1 where (select count(*) from pg_statistic p2 where p1.starelid = p2.starelid and p1.staattnum = p2.staattnum) > 1 doesn't turn up anything. Nor does dumping select starelid,

Re: [BUGS] right sibling is not next child

2006-04-12 Thread Tom Lane
"Peter Brant" <[EMAIL PROTECTED]> writes: > Item 85 -- Length: 56 Offset: 2120 (0x0848) Flags: USED > Block Id: 640 linp Index: 1 Size: 56 > Has Nulls: 0 Has Varwidths: 16384 > Item 86 -- Length: 56 Offset: 2176 (0x0880) Flags: USED > Block Id: 635 linp Index: 1 Size: 56 >

Re: [BUGS] right sibling is not next child

2006-04-12 Thread Tom Lane
"Peter Brant" <[EMAIL PROTECTED]> writes: > bigbird=# vacuum analyze "MaintCode"; > ERROR: duplicate key violates unique constraint > "pg_statistic_relid_att_index" Hm, can you see any rows in pg_statistic with duplicate values of (starelid, staattnum)? If so it'd be useful to look at their ctid

Re: [BUGS] right sibling is not next child

2006-04-12 Thread Peter Brant
It is repeatable. A reindex doesn't work. Pete bigbird=# vacuum analyze "MaintCode"; ERROR: duplicate key violates unique constraint "pg_statistic_relid_att_index" bigbird=# vacuum analyze verbose "MaintCode"; INFO: vacuuming "public.MaintCode" INFO: index "MaintCode_pkey" now contains 19 r

Re: [BUGS] right sibling is not next child

2006-04-11 Thread Tom Lane
"Peter Brant" <[EMAIL PROTECTED]> writes: > Also, when I tried to run a database-wide VACUUM ANALYZE VERBOSE it > actually doesn't even get to Panel and errors out with: > ERROR: duplicate key violates unique constraint > "pg_statistic_relid_att_index" Hm, my eyebrows just disappeared over the b

Re: [BUGS] right sibling is not next child

2006-04-11 Thread Peter Brant
Also, when I tried to run a database-wide VACUUM ANALYZE VERBOSE it actually doesn't even get to Panel and errors out with: INFO: analyzing "public.MaintCode" INFO: "MaintCode": scanned 1 of 1 pages, containing 19 live rows and 0 dead rows; 19 rows in sample, 19 estimated total rows ERROR: dupl

Re: [BUGS] right sibling is not next child

2006-04-11 Thread Peter Brant
The index data isn't sensitive, but I should ask for permission nonetheless. I'll send over the '-f' output tomorrow morning. Pete *** * PostgreSQL File/Block Formatted Dump Utility - Version 8.1.1 * * File: 180571 * Options used:

Re: [BUGS] right sibling is not next child

2006-04-11 Thread Tom Lane
"Peter Brant" <[EMAIL PROTECTED]> writes: > I ended up modifying the elog again with the following results: > PANIC: right sibling is not next child in "Panel_pkey", parent is 271, > target is 635, rightsib is 629, nextoffset is 87 OK, so the part of the pg_filedump info we need to see is items 8

Re: [BUGS] right sibling is not next child

2006-04-11 Thread Peter Brant
Try as I might, I wasn't able to get a JIT debugger give me a memory dump. It seems like postgres.exe is not really crashing in the "unhandled exception" sense (see gdb log below)? Am I missing a configure option? (As an aside, what's the best way to get a core dump on Windows? Can gdb read mem

Re: [BUGS] right sibling is not next child

2006-04-11 Thread Tom Lane
"Peter Brant" <[EMAIL PROTECTED]> writes: > PANIC: right sibling is not next child in "Panel_pkey", parent is 271 Hmm ... that's not actually enough info to tell us where to look, is it :-(. Please add the following variables to the elog message, or gdb for them if you can: target

Re: [BUGS] right sibling is not next child

2006-04-11 Thread Peter Brant
Sorry about the delay in responding. We had a bit of difficulty with the test machine. Kevin is also on vacation this week. The problem is repeatable with a VACUUM. I've found the offending block. A (partial) pg_filedump of that block is pasted in below. I'm a little lost as to what the next

[BUGS] right sibling is not next child

2006-04-06 Thread Kevin Grittner
I'm reporting this as a PostgreSQL bug because it involves an index corruption. I can't see any other way our application should be able to corrupt an index. I will attach the tail of the log when the corruption was detected (and the postmaster shut itself down), as well as the subsequent attempt

Re: [BUGS] right sibling is not next child

2006-04-06 Thread Tom Lane
"Kevin Grittner" <[EMAIL PROTECTED]> writes: > Tom Lane <[EMAIL PROTECTED]> wrote: >> Anyway, that explains your "heap_clean_redo: no block" failure. I think >> you're stuck risking a pg_resetxlog to try to get back into the >> database. > Will do. Before I do that, though, is it worth making a

Re: [BUGS] right sibling is not next child

2006-04-06 Thread Kevin Grittner
>>> On Thu, Apr 6, 2006 at 1:26 pm, in message <[EMAIL PROTECTED]>, > "Kevin Grittner" <[EMAIL PROTECTED]> writes: >> Tom Lane <[EMAIL PROTECTED]> wrote: >>> You weren't by any chance running with full_page_writes = off >>> were you? > >> Yes we were. Apparently I have misunderstood the implica

Re: [BUGS] right sibling is not next child

2006-04-06 Thread Tom Lane
"Kevin Grittner" <[EMAIL PROTECTED]> writes: > Tom Lane <[EMAIL PROTECTED]> wrote: >> You weren't by any chance running with full_page_writes = off >> were you? > Yes we were. Apparently I have misunderstood the implications of this. So had we all :-(. It just plain doesn't work in 8.1.*, and

Re: [BUGS] right sibling is not next child

2006-04-06 Thread Kevin Grittner
>>> On Thu, Apr 6, 2006 at 12:57 pm, in message <[EMAIL PROTECTED]>, Tom Lane <[EMAIL PROTECTED]> wrote: > "Kevin Grittner" <[EMAIL PROTECTED]> writes: >> Right now the postmaster refuses to start. What is the best way to get >> past that to try what you suggest? > >> [2006- 04- 06 07:22:50.378

Re: [BUGS] right sibling is not next child

2006-04-06 Thread Tom Lane
"Kevin Grittner" <[EMAIL PROTECTED]> writes: > Right now the postmaster refuses to start. What is the best way to get > past that to try what you suggest? > [2006-04-06 07:22:50.378 ] 3984 PANIC: heap_clean_redo: no block Hm, did this start happening immediately after the other problem? That wo

Re: [BUGS] right sibling is not next child

2006-04-06 Thread Kevin Grittner
Right now the postmaster refuses to start. What is the best way to get past that to try what you suggest? [2006-04-06 07:22:50.347 ] 3984 LOG: database system was interrupted while in recovery at 2006-04-06 02:19:59 Central Daylight Time [2006-04-06 07:22:50.347 ] 3984 HINT: This probably means

Re: [BUGS] right sibling is not next child

2006-04-06 Thread Tom Lane
"Kevin Grittner" <[EMAIL PROTECTED]> writes: > [2006-04-06 02:19:57.460 ] 3848 PANIC: > right sibling is not next child in "Panel_pkey" This should be repeatable by re-attempting a VACUUM, right? Please find out which page exactly it's unhappy about (either gdb the crash or add a printout of t

[BUGS] right sibling is not next child

2006-04-06 Thread Kevin Grittner
Apologies if this is a duplicate, but my original post stalled and I noticed I had omitted the postgres version, which you will want. I'm reporting this as a PostgreSQL bug because it involves an index corruption. I can't see any other way our application should be able to corrupt an index. I wi