Re: [BUGS] [GENERAL] cache lookup of relation 165058647 failed

Jan Wieck Thu, 06 May 2004 16:43:06 -0700

Sean Chittenden wrote:

I'v find out that this error occurs in:
 dependency.c file
2004-04-26 11:09:34 ERROR: dependency.c 1621: cache lookup of relation 149064743 failed 2004-04-26 11:09:34 ERROR: Relation "tmp_table1" does not exist 2004-04-26 11:09:34 ERROR: Relation "tmp_table1" does not exist
in getRelationDescription(StringInfo buffer, Oid relid) function.
Any ideas what can cause this errors.
<aol>Me too.</aol> But, I am suspecting that it's a race condition with the new background writer code. I've started testing a new database design and was able to reproduce this on my laptop nearly 90% of the time, but could only reproduce it about 10% of the time on my production databases until I figured out what the difference was, fsync.
temp tables don't use the shared buffer cache, how can this be related to the BG writer?
Don't the system catalogs use the shared buffer cache?
BEGIN; SELECT create_temp_table_func(); -- Inserts a row into pg_class via CREATE TEMP TABLE -- Do other stuff COMMIT; -- After the commit, the row is now visible to other backends -- disconnect -- If the delay between the disconnect and reconnect is small enough -- reconnect -- It's as though there is a race condition that allows the function -- pg_table_is_visible() to assert the "cache lookup of relation" -- error. BEGIN; SELECT create_temp_table_func(); -- Before the CREATE TEMP TABLE, I call /* SELECT TRUE FROM pg_catalog.pg_class c LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace WHERE c.relname = ''footmp''::TEXT AND c.relkind = ''r''::TEXT AND pg_catalog.pg_table_is_visible(c.oid); */ -- But the query fails
My guess was that the series of events went something like:
proc 0) COMMIT's and the row in pg_class is committed
proc 1) bgwriter writer code removes a page for the cache
proc 2) queries for the page  [*]
proc 1) writes it to disk
proc 2) queries for the page  [*]
proc 1) sync's the fd
[*] proc 2 queries for the page at either of these points
In 7.4, there is no bgwriter or background process mucking with cache,

Except for the checkpoint process, which does exactly the same as the bgwriter does, and ALL concurrent backends whenever they feel the need to evict a dirty buffer.

If it makes a difference if a pg_class page is dirty in the buffer or copied out to disk with respect to visibility rules of the tuples contained in it, then the whole thing is a way larger bug than the one in MIB. First of all, committed or not, a temp object from one session should NEVER be visible in any other.

Jan

which is why this works 100% of the time. In 7.5, however, there's a 200ms gap where a race condition appears and pg_table_is_visible() fails its PointerIsValid() check. If I put a sleep in, the sleep gives the bgwriter enough time to commit the pages to disk so that the queries for the page happen after the fd's been sync()'ed.

I have no other clue as to why this would be happening though, so believe me when I say, I could very well be quite wrong.... but this is my best, quasi-educated/grep(1)'ed guess.
-sc

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== [EMAIL PROTECTED] #


---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Re: [BUGS] [GENERAL] cache lookup of relation 165058647 failed

Reply via email to