On Tue, Aug 16, 2011 at 3:45 PM, Tom Lane wrote:
>> It would be nice to move the short-circuit test I recently inserted at
>> the top of SIGetDataEntries() somewhere higher up in the call stack,
>> but right now the layers of abstraction are so thick that it's not
>> exactly clear how to do that.
Robert Haas writes:
> On Thu, Aug 11, 2011 at 5:09 PM, Tom Lane wrote:
>> What's bothering me at the moment is that the CLOBBER_CACHE_ALWAYS hack,
>> which was meant to expose exactly this sort of problem, failed to do so
>> --- buildfarm member jaguar has been running with that flag for ages and
[adding back hackers so the thread shows the resolution]
On Sun, Aug 14, 2011 at 07:02:55PM -0400, Tom Lane wrote:
> Sounds good. Based on my own testing so far, I think that patch will
> probably make things measurably better for you, though it won't resolve
> every corner case.
The most recent
On Thu, Aug 11, 2011 at 5:09 PM, Tom Lane wrote:
> I can reproduce the problem fairly conveniently with this crude hack:
>
> diff --git a/src/backend/storage/ipc/sinval.c
> b/src/backend/storage/ipc/sinval.c
> index 8499615..5ad2aee 100644
> *** a/src/backend/storage/ipc/sinval.c
> --- b/src/back
I wrote:
> I still haven't reproduced the behavior here, but I think I see what
> must be happening: we are getting an sinval reset while attempting to
> open pg_class_oid_index.
After a number of false starts, I've managed to reproduce this behavior
locally. The above theory turns out to be wron
daveg writes:
> Should this be applied in addition to the earlier patch, or to replace it?
Apply it instead of the earlier one.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postg
On Fri, Aug 05, 2011 at 12:10:31PM -0400, Tom Lane wrote:
> I wrote:
> > Ahh ... you know what, never mind about stack traces, let's just see if
> > the attached patch doesn't fix it.
>
> On reflection, that patch would only fix the issue for pg_class, and
> that's not the only catalog that gets c
I wrote:
> Ahh ... you know what, never mind about stack traces, let's just see if
> the attached patch doesn't fix it.
On reflection, that patch would only fix the issue for pg_class, and
that's not the only catalog that gets consulted during relcache reloads.
I think we'd better do it as attache
Ahh ... you know what, never mind about stack traces, let's just see if
the attached patch doesn't fix it.
I still haven't reproduced the behavior here, but I think I see what
must be happening: we are getting an sinval reset while attempting to
open pg_class_oid_index. The latter condition cause
daveg writes:
> On Thu, Aug 04, 2011 at 04:16:08PM -0400, Tom Lane wrote:
>> If this theory is correct then all of the file-related errors ought to
>> match up to recently-vacuumed mapped catalogs or indexes (those are the
>> ones with relfilenode = 0 in pg_class). Do you want to expand your
>> l
On Thu, Aug 04, 2011 at 04:16:08PM -0400, Tom Lane wrote:
> daveg writes:
> > We are seeing "cannot read' and 'cannot open' errors too that would be
> > consistant with trying to use a vanished file.
>
> Yeah, these all seem consistent with the idea that the failing backend
> somehow missed an up
daveg writes:
> We are seeing "cannot read' and 'cannot open' errors too that would be
> consistant with trying to use a vanished file.
Yeah, these all seem consistent with the idea that the failing backend
somehow missed an update for the relation mapping file. You would get
the "could not find
On Thu, Aug 04, 2011 at 12:28:31PM -0400, Tom Lane wrote:
> daveg writes:
> > Summary: the failing process reads 0 rows from 0 blocks from the OLD
> > relfilenode.
>
> Hmm. This seems to mean that we're somehow missing a relation mapping
> invalidation message, or perhaps not processing it soon
On Thu, Aug 04, 2011 at 12:28:31PM -0400, Tom Lane wrote:
> daveg writes:
> > Summary: the failing process reads 0 rows from 0 blocks from the OLD
> > relfilenode.
>
> Hmm. This seems to mean that we're somehow missing a relation mapping
> invalidation message, or perhaps not processing it soon
daveg writes:
> Summary: the failing process reads 0 rows from 0 blocks from the OLD
> relfilenode.
Hmm. This seems to mean that we're somehow missing a relation mapping
invalidation message, or perhaps not processing it soon enough during
some complex set of invalidations. I did some testing
On Wed, Aug 03, 2011 at 11:18:20AM -0400, Tom Lane wrote:
> Evidently not, if it's not logging anything, but now the question is
> why. One possibility is that for some reason RelationGetNumberOfBlocks
> is persistently lying about the file size. (We've seen kernel bugs
> before that resulted in
daveg writes:
> We have installed the patch and have encountered the error as usual.
> However there is no additional output from the patch. I'm speculating
> that the pg_class scan in ScanPgRelationDetailed() fails to return
> tuples somehow.
Evidently not, if it's not logging anything, but now
On Mon, Aug 01, 2011 at 01:23:49PM -0400, Tom Lane wrote:
> daveg writes:
> > On Sun, Jul 31, 2011 at 11:44:39AM -0400, Tom Lane wrote:
> >> I think we need to start adding some instrumentation so we can get a
> >> better handle on what's going on in your database. If I were to send
> >> you a so
daveg writes:
> On Sun, Jul 31, 2011 at 11:44:39AM -0400, Tom Lane wrote:
>> I think we need to start adding some instrumentation so we can get a
>> better handle on what's going on in your database. If I were to send
>> you a source-code patch for the server that adds some more logging
>> printo
On Sun, Jul 31, 2011 at 11:44:39AM -0400, Tom Lane wrote:
> daveg writes:
> > Here is the update: the problem happens with vacuum full alone, no reindex
> > is needed to trigger it. I updated the script to avoid reindexing after
> > vacuum. Over the past two days there are still many ocurrances of
daveg writes:
> Here is the update: the problem happens with vacuum full alone, no reindex
> is needed to trigger it. I updated the script to avoid reindexing after
> vacuum. Over the past two days there are still many ocurrances of this
> error coincident with the vacuum.
Well, that jives with t
On Thu, Jul 28, 2011 at 11:31:31PM -0700, daveg wrote:
> On Thu, Jul 28, 2011 at 07:45:01PM -0400, Robert Haas wrote:
> > REINDEX. My guess is that this is happening either right around the
> > time the VACUUM FULL commits or right around the time the REINDEX
> > commits. It'd be helpful to know
On Fri, Jul 29, 2011 at 09:55:46AM -0400, Tom Lane wrote:
> The thing that was bizarre about the one instance in the buildfarm was
> that the error was persistent, ie, once a session had failed all its
> subsequent attempts to access pg_class failed too. I gather from Dave's
> description that it'
Robert Haas writes:
> On Fri, Jul 29, 2011 at 11:27 AM, Tom Lane wrote:
>> Well, no, because the ScanPgRelation call is not failing internally.
>> It's performing a seqscan of pg_class and not finding a matching tuple.
> SnapshotNow race?
That's what I would have guessed to start with, except t
On Fri, Jul 29, 2011 at 11:27 AM, Tom Lane wrote:
> Robert Haas writes:
>> On Fri, Jul 29, 2011 at 9:55 AM, Tom Lane wrote:
>>> The thing that was bizarre about the one instance in the buildfarm was
>>> that the error was persistent, ie, once a session had failed all its
>>> subsequent attempts
Robert Haas writes:
> On Fri, Jul 29, 2011 at 9:55 AM, Tom Lane wrote:
>> The thing that was bizarre about the one instance in the buildfarm was
>> that the error was persistent, ie, once a session had failed all its
>> subsequent attempts to access pg_class failed too.
> I was thinking more alo
On Fri, Jul 29, 2011 at 9:55 AM, Tom Lane wrote:
> daveg writes:
>> On Thu, Jul 28, 2011 at 07:45:01PM -0400, Robert Haas wrote:
>>> Ah, OK, sorry. Well, in 9.0, VACUUM FULL is basically CLUSTER, which
>>> means that a REINDEX is happening as part of the same operation. In
>>> 9.0, there's no p
daveg writes:
> On Thu, Jul 28, 2011 at 07:45:01PM -0400, Robert Haas wrote:
>> Ah, OK, sorry. Well, in 9.0, VACUUM FULL is basically CLUSTER, which
>> means that a REINDEX is happening as part of the same operation. In
>> 9.0, there's no point in doing VACUUM FULL immediately followed by
>> REI
On Thu, Jul 28, 2011 at 07:45:01PM -0400, Robert Haas wrote:
> On Thu, Jul 28, 2011 at 5:46 PM, daveg wrote:
> > On Thu, Jul 28, 2011 at 09:46:41AM -0400, Robert Haas wrote:
> >> On Wed, Jul 27, 2011 at 8:28 PM, daveg wrote:
> >> > My client has been seeing regular instances of the following sort
On Thu, Jul 28, 2011 at 5:46 PM, daveg wrote:
> On Thu, Jul 28, 2011 at 09:46:41AM -0400, Robert Haas wrote:
>> On Wed, Jul 27, 2011 at 8:28 PM, daveg wrote:
>> > My client has been seeing regular instances of the following sort of
>> > problem:
>> On what version of PostgreSQL?
>
> 9.0.4.
>
> I
On Thu, Jul 28, 2011 at 09:46:41AM -0400, Robert Haas wrote:
> On Wed, Jul 27, 2011 at 8:28 PM, daveg wrote:
> > My client has been seeing regular instances of the following sort of
> > problem:
> On what version of PostgreSQL?
9.0.4.
I previously said:
> > This occurs on postgresql 9.0.4. on 3
On Wed, Jul 27, 2011 at 8:28 PM, daveg wrote:
> My client has been seeing regular instances of the following sort of problem:
On what version of PostgreSQL?
> If simplicity worked, the world would be overrun with insects.
I thought it was... :-)
--
Robert Haas
EnterpriseDB: http://www.enterp
My client has been seeing regular instances of the following sort of problem:
...
03:06:09.453 exec_simple_query, postgres.c:900
03:06:12.042 XX000: could not find pg_class tuple for index 2662 at character
13
03:06:12.042 RelationReloadIndexInfo, relcache.c:1740
03:06:12.042 INSERT INTO zzz
33 matches
Mail list logo