Tom Lane wrote:
Heikki Linnakangas <[EMAIL PROTECTED]> writes:
Tom Lane wrote:
Any suggestions how to extract some info out of this?

Does OS X have the catchsegv tool?

No, but I suddenly remembered about CrashReporter, and sure enough it's
catching these crashes:

Exception:  EXC_BAD_ACCESS (0x0001)
Codes:      KERN_PROTECTION_FAILURE (0x0002) at 0x00000010

Thread 0 Crashed:
0   postmaster  0x001af4ef smgrextend + 12 (smgr.c:485)
1   postmaster  0x00029044 end_heap_rewrite + 208 (rewriteheap.c:278)
2   postmaster  0x000bdc22 cluster_rel + 850 (cluster.c:806)
3   postmaster  0x000be119 cluster + 160 (cluster.c:220)
4   postmaster  0x001b74a8 PortalRunUtility + 233 (palloc.h:84)
5   postmaster  0x001b7784 PortalRunMulti + 237 (pquery.c:1271)
6   postmaster  0x001b80ae PortalRun + 918 (pquery.c:813)
7   postmaster  0x001b2afd exec_simple_query + 656 (postgres.c:965)
8   postmaster  0x001b4b0c PostgresMain + 5628 (postgres.c:3507)
9   postmaster  0x00183973 ServerLoop + 2828 (postmaster.c:2614)
10  postmaster  0x00184b1e PostmasterMain + 2794 (postmaster.c:972)
11  postmaster  0x00130f8e main + 1236 (main.c:188)
12  postmaster  0x00001e86 _start + 216
13  postmaster  0x00001dad start + 41

So it looks like this has got something to do with the MVCC-safe cluster
changes, which is not too surprising considering it started happening
around about then.  Off to have a look ...

I've been looking at the code for a few minutes as well, but haven't found an explanation for that yet.

But I did notice that we're not fsyncing the newly written relation like we should. There's a comment raw_heap_insert:
        /*
         * Now write the page. We say isTemp = true even if it's not a
         * temp table, because there's no need for smgr to schedule an
         * fsync for this write; we'll do it ourselves before committing.
         */
        smgrextend(state->rs_new_rel->rd_smgr, state->rs_blockno,
                           (char *) page, true);

That's copy-pasted from tablecmds.c. But unlike in tablecmds.c, end_heap_rewrite only fsyncs the new file if we're not WAL-logging. Proposed fix:

Index: src/backend/access/heap/rewriteheap.c
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/heap/rewriteheap.c,v
retrieving revision 1.1
diff -c -r1.1 rewriteheap.c
*** src/backend/access/heap/rewriteheap.c 8 Apr 2007 01:26:27 -0000 1.1
--- src/backend/access/heap/rewriteheap.c       17 Apr 2007 20:50:05 -0000
***************
*** 272,282 ****
        }

        /*
!        * If not WAL-logging, must fsync before commit.  We use heap_sync
!        * to ensure that the toast table gets fsync'd too.
         */
!       if (!state->rs_use_wal)
!               heap_sync(state->rs_new_rel);

        /* Deleting the context frees everything */
        MemoryContextDelete(state->rs_cxt);
--- 272,284 ----
        }

        /*
!        * Must fsync before commit, even if we've WAL-logged the changes,
! * because we've written pages outside the buffer manager. See comments! * in copy_relation_data in commands/tablecmds.c for more information.
!        *
!        * We use heap_sync to ensure that the toast table gets fsync'd too.
         */
!       heap_sync(state->rs_new_rel);

        /* Deleting the context frees everything */
        MemoryContextDelete(state->rs_cxt);


BTW: In tablecmds.c the new relation is fsynced with smgrimmedsync, not heap_sync. How about the toast table, it goes through shared buffers as usual, right?

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

               http://www.postgresql.org/about/donate

Reply via email to