Hi All,
I've been getting these errors ("ERROR: cache lookup failed for relation 17442") in my logs for a while now. It originally seemed like a hardware problem, however now we getting them pretty consistently on a couple servers. I've scalled down the schema to the one table and the function involved and included a code snipet to make a bunch of connections and loop around calling the same function. It usually takes 100-2000 iterations before these messages start appearing in the log. I've also included the original function, this takes 10,000 iterations for the error to start showing. I should note, we've been getting these erros since version 7, this is the first time they were reproducable..
With the original function, the log messages were slightly different and usually caused the server to reset: i.e. ERROR: type "t" already exists ERROR: duplicate key violates unique constraint "pg_type_typname_nsp_index" ERROR: duplicate key violates unique constraint "pg_type_typname_nsp_index" ERROR: duplicate key violates unique constraint "pg_type_typname_nsp_index" CONTEXT: SQL statement "create temp table tmp_children ( uniqid bigint, memberid bigint, membertype varchar(50), ownerid smallint, tag varchar(50), level int4 )" PL/pgSQL function "fngetcompositeids2" line 14 at SQL statement ERROR: duplicate key violates unique constraint "pg_type_typname_nsp_index" ERROR: cache lookup failed for type 2449707570 FATAL: cache lookup failed for type 2449707570
Environment info: Postgres v8, suse linix with latest kernal patches, filesystem: reiserfs.
Please let me know if you need anymore information. No data is need, just the schema included.
Thanks Michael
Michael,
The interesting thing about this bug is: We had the same thing on a customer's machine some time ago. It actually occurred after a certain script (nothing big) was run the 100.001st time (maybe) on an empty database. So this one does not seem to be related to the schema - it is more or less random ...
The interesting thing is: We copied the data directory from the customer and we were not able to reproduce the same behaviour on a different machine.
The strange thing is: After doing a checkpoint and restarting the database the problem still occurred. Starting the same binary thing on a different machine did not show that error ...
We stepped through it with gdb but we could not find anything strange ...
Can you reliably reproduce the problem after a arbitrary amount of iterations on a different machine? We couldn't ...
Looking at the code: This is a null pointer caught by the system ... Something seems to corrupt memory ...
Hans
-- Cybertec Geschwinde u Schoenig Schoengrabern 134, A-2020 Hollabrunn, Austria Tel: +43/660/816 40 77 www.cybertec.at, www.postgresql.at
---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]