Unfortunately, the original test environment has been blown away in favor of 
testing the 8.1 beta release.  I can confirm that the problem exists on a build 
of the 8.1 beta.  If it would be helpful I could set it up again on 8.0.3 to 
confirm.  I THINK it was actually the tip of the 8.0 stable branch as opposed 
to the 8.0.3 release proper.
 
We have a little more information about the failure pattern -- when we get 
these, it is always after there has been a rollback on the thread which 
eventually generates the serialization error.  So I think the pattern is:
 
ConnectionA:
  -  A series of insert/update/deletes (on tables OTHER than the progress 
table).
  -  Update the progress table.
  -  Commit the transaction.
ConnectionB:
  -  A series of insert/update/deletes (on tables OTHER than the progress 
table) fails.
  -  Rollback the transaction.
  -  Attempt each insert/update/delete individually.   Commit or rollback each 
as we go.
  -  Attempt to update the progress table -- fail on serialization error.
 
To avoid any ambiguity in my former posts -- introducing even a very small 
delay between the operations on ConnectionA and ConnectionB makes the 
serialization error very infrequent; introducing a larger delay seems to make 
it go away.  I hate to consider that as a solution, however.
 
I'm afraid I'm not familiar with a good way to capture the stream of 
communications with the database server.  If you could point me in the right 
direction, I'll give it my best shot.
 
I did just have a thought, though -- is there any chance that the JDBC 
Connection.commit is returning once the command is written to the TCP buffer, 
and I'm getting hurt by some network latency issues -- the Nagle algorithm or 
some such?  (I assume that the driver is waiting for a response from the server 
before returning, so this shouldn't be the issue.)  At the point that the 
commit confirmation is sent by the server, I assume the shared memory changes 
are visible to the other processes?
 
-Kevin
 
 
>>> Tom Lane <[EMAIL PROTECTED]> 08/26/05 12:16 PM >>>
"Kevin Grittner" <[EMAIL PROTECTED]> writes:
> What happens if the timestamp of the commit is an exact match for the
> timestamp of the next transaction start?  What is the resolution of
> the time sampling?

It's not done via timestamps: rather, each transaction takes a census
of the transaction XIDs that are running in other backends when it
starts (there is an array in shared memory that lets it get this
information cheaply).  Reliability of the system clock is not a factor.

Are you sure the server is 8.0.3?  There was a bug in prior releases
that might possibly be related:

2005-05-07 17:22  tgl

        * src/backend/utils/time/: tqual.c (REL7_3_STABLE), tqual.c
        (REL7_4_STABLE), tqual.c (REL7_2_STABLE), tqual.c (REL8_0_STABLE),
        tqual.c: Adjust time qual checking code so that we always check
        TransactionIdIsInProgress before we check commit/abort status. 
        Formerly this was done in some paths but not all, with the result
        that a transaction might be considered committed for some purposes
        before it became committed for others.  Per example found by Jan
        Wieck.

My recollection though is that this only affected applications that were
using SELECT FOR UPDATE.  In any case, it's pretty hard to see how this
would affect an application that is in fact waiting for the backend to
report commit-done before it launches the next transaction; the
race-condition window we were concerned about no longer exists by the
time the backend sends CommandComplete.  So my suspicion remains fixed
on that point.  Do you have any way of sniffing the network traffic of
the middle-tier to confirm that it's doing what it's supposed to?

                        regards, tom lane


---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq

Reply via email to