Simon Riggs <[EMAIL PROTECTED]> writes:
> My recent patch will prevent server startup, so if you do a fast restart
> to bounce the server and change parameters you'll have to keep the
> server down while the archiver completes (or you kill it).
BTW, I was not planning on having it do that. The ar
On Tue, 2006-05-23 at 11:09 -0400, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > On Tue, 2006-05-23 at 10:53 -0400, Tom Lane wrote:
> >> I think we just need a PostmasterIsAlive check in the per-file loop.
>
> > ...which would mean the archiver would not outlive postmaster in the
>
Simon Riggs <[EMAIL PROTECTED]> writes:
> On Tue, 2006-05-23 at 10:53 -0400, Tom Lane wrote:
>> I think we just need a PostmasterIsAlive check in the per-file loop.
> ...which would mean the archiver would not outlive postmaster in the
> event it crashes...which is exactly the time you want it to
On Tue, 2006-05-23 at 10:53 -0400, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > This doesn't quite get to the nub of the problem: archiver is designed
> > to keep archiving files, even in the event that the postmaster explodes.
> > It will keep archiving until they're all gone.
>
Simon Riggs <[EMAIL PROTECTED]> writes:
> This doesn't quite get to the nub of the problem: archiver is designed
> to keep archiving files, even in the event that the postmaster explodes.
> It will keep archiving until they're all gone.
I think we just need a PostmasterIsAlive check in the per-fi
On Fri, 2006-05-19 at 17:27 +0100, Simon Riggs wrote:
> On Fri, 2006-05-19 at 12:03 -0400, Tom Lane wrote:
> > Simon Riggs <[EMAIL PROTECTED]> writes:
> > > OK, I'm on it.
> >
> > What solution have you got in mind? I was thinking about an fcntl lock
> > to ensure only one archiver is active in a
On Tue, 23 May 2006, Tom Lane wrote:
I'm still thinking that the simplest explanation is that $PGDATA/pg_clog/
is on the NAS device. Please double-check the file locations.
I know that seems like an excellent candidate, but it really isn't, I swear.
In fact, you almost had me convinced the l
Jeff Frost <[EMAIL PROTECTED]> writes:
> I tried both pulling the plug on the CIFS server and unsharing the CIFS
> share,
> but pgbench continued completely unconcerned. I guess the failure mode of
> the
> NAS device in the customer colo must be something different that I don't yet
> know how
On Sun, 21 May 2006, Jeff Frost wrote:
So the chances of the original problem being archiver related are
receding...
This is possible, but I guess I should try and reproduce the actual problem
with the same archive_command script and a CIFS mount just to see what
happens. Perhaps the real r
On Sun, 21 May 2006, Simon Riggs wrote:
I've been futzing with trying to reproduce the original problem for a few days
and so far postgres seems to be just fine with a long delay on archiving, so
now I'm rather at a loss. In fact, I currently have 1,234 xlog files in
pg_xlog, but the archiver i
Jeff Frost <[EMAIL PROTECTED]> writes:
> Well now, will you look at this:
> postgres 20228 1 0 May17 ?00:00:00 postgres: archiver process
> postgres 20573 1 0 May17 ?00:00:00 postgres: archiver process
> postgres 23817 23810 0 May17 pts/11 00:00:00 postgres: archiver p
On Sun, 2006-05-21 at 14:16 -0700, Jeff Frost wrote:
> On Fri, 19 May 2006, Simon Riggs wrote:
>
> >> Now I can run my same pg_bench, or do you guys
> >> have any other suggestions on attempting to reproduce the problem?
> >
> > No. We're back on track to try to reproduce the original error.
>
>
On Fri, 19 May 2006, Simon Riggs wrote:
Now I can run my same pg_bench, or do you guys
have any other suggestions on attempting to reproduce the problem?
No. We're back on track to try to reproduce the original error.
I've been futzing with trying to reproduce the original problem for a few
On Fri, 2006-05-19 at 09:36 -0700, Jeff Frost wrote:
> On Fri, 19 May 2006, Tom Lane wrote:
>
> > What I'd suggest is resuming the test after making sure you've killed
> > off any old archivers, and seeing if you can make any progress on
> > reproducing the original problem. We definitely need a
On Fri, 19 May 2006, Tom Lane wrote:
What I'd suggest is resuming the test after making sure you've killed
off any old archivers, and seeing if you can make any progress on
reproducing the original problem. We definitely need a
multiple-archiver interlock, but I think that must be unrelated to
On Fri, 19 May 2006, Tom Lane wrote:
Well, the fact that there's only one archiver *now* doesn't mean there
wasn't more than one when the problem happened. The orphaned archiver
would eventually quit.
Do you have logs that would let you check when the production postmaster
was restarted?
I l
On Fri, 2006-05-19 at 12:03 -0400, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > OK, I'm on it.
>
> What solution have you got in mind? I was thinking about an fcntl lock
> to ensure only one archiver is active in a given data directory. That
> would fix the problem without affec
On Fri, 2006-05-19 at 12:20 -0400, Tom Lane wrote:
> I wrote:
> > Well, the fact that there's only one archiver *now* doesn't mean there
> > wasn't more than one when the problem happened. The orphaned archiver
> > would eventually quit.
>
> But, actually, nevermind: we have explained the failure
I wrote:
> Well, the fact that there's only one archiver *now* doesn't mean there
> wasn't more than one when the problem happened. The orphaned archiver
> would eventually quit.
But, actually, nevermind: we have explained the failures you were seeing
in the test setup, but a multiple-active-arch
Jeff Frost <[EMAIL PROTECTED]> writes:
> Hurray! Unfortunately, the postmaster on the original troubled server almost
> never gets restarted, and in fact only has only one archiver process running
> right now. Drat!
Well, the fact that there's only one archiver *now* doesn't mean there
wasn't m
On Fri, 19 May 2006, Tom Lane wrote:
Well, there's our smoking gun. IIRC, all the failures you showed us are
consistent with race conditions caused by multiple archiver processes
all trying to do the same tasks concurrently.
Do you frequently stop and restart the postmaster? Because I don't s
Simon Riggs <[EMAIL PROTECTED]> writes:
> OK, I'm on it.
What solution have you got in mind? I was thinking about an fcntl lock
to ensure only one archiver is active in a given data directory. That
would fix the problem without affecting anything outside the archiver.
Not sure what's the most po
22 matches
Mail list logo