> Alvaro Herrera <alvherre ( at ) commandprompt ( dot ) com> writes: >> Maybe we could write a suitable test case using Martijn's concurrent >> testing framework. > > The trick is to get process A to commit between the times that process B > looks at the new and old versions of the pg_class row (and it has to > happen to do so in that order ... although that's not a bad bet given > the way btree handles equal keys). > > I think the reason we've not tracked this down before is that that's a > pretty small window. You could force the problem by stopping process B > with a debugger breakpoint and then letting A do its thing, but short of > something like that you'll never reproduce it with high probability.
Actually I was already looking into a related issue and have some work here that may help with this. I wanted to test the online index build and to do that I figured you needed to have regression tests like the ones we have now except with multiple database sessions. So I hacked psql to issue queries asynchronously and allow multiple database connections. That way you can switch connections while a blocked or slow transaction is still running and issue queries in other transactions. I thought it was a proof-of-concept kludge but actually it's worked out quite well. There were a few conceptual gotchas but I think I have a reasonable solution for each. The main issue was that any time you issue an asynchronously connection that you expect to block you have a race condition in the test. You can't switch connections and proceed right away or you may actually proceed with the other connection before the first connection's command is received and acted on by the backend. The "right" solution to this would involve altering the backend and the protocol to provide some form of feedback when an asynchronous query had reached various states including when it was blocked. You would have to annotate it with enough information that the client can determine it's actually blocked on the right thing and not just on some uninteresting transient lock too. Instead I just added a command to cause psql to wait for a time. This is nearly as good since all the regression tests run fairly quickly so if you wait even a fraction of a second you can be pretty certain the command has been received and if it were not going to block it would have finished and printed output already. And it was *much* simpler. Also, I think for interactive use we would want a somewhat more sophisticated scheduling of output. It would be nice to print out results as they come in even if we're on another connection. For the regression tests you certainly do not want that since that would introduce unavoidable non-deterministic race conditions in your output files all over the place. The way I've coded it now takes care to print out output only from the "active" database connection and the test cases need to be written to switch connections at each point they want to test for possibly incorrect output. Another issue was that I couldn't come up with a nice set of names for the commands that didn't conflict with the myriad of one-letter commands already in psql. So I just prefixed the all with "c" (connection). I figured when I submitted it I would just let the community hash out the names and take the 2s it would take to change them. The test cases are actually super easy to write and read, at least considering we're talking about concurrent sql sessions here. I think it's far clearer than trying to handle separate scripts and nearly as clear as Martin's proposal from a while back to prepend a connection number on every line. The commands I've added or altered are: \c[onnect][&] [DBNAME|- USER|- HOST|- PORT|-] connect to new database (currently "postgres") if optional & is present open do not close existing connection \cswitch n switch to database connection n \clist list database connections \cdisconnect close current database connection use \cswitch or \connect to select another connection \cnowait issue next query without waiting for results \cwait [n] if any queries are pending wait n seconds for results Also I added %& to the psql prompt format to indicate the current connection. So the tests look like, for example: postgres=# \c& [2] You are now connected to database "postgres". postgres[2]=# begin; BEGIN postgres[2]=# create table foo (a integer); CREATE TABLE postgres[2]=# \cswitch 1 [1] You are now connected to database "postgres" postgres[1]=# select * from foo; ERROR: relation "foo" does not exist postgres[1]=# \cswitch 2 [2] You are now connected to database "postgres" postgres[2]=# commit; COMMIT postgres[2]=# \cswitch 1 [1] You are now connected to database "postgres" postgres[1]=# select * from foo; a --- (0 rows) postgres[1]=# insert into foo values (1); INSERT 0 1 postgres[1]=# begin; BEGIN postgres[1]=# update foo set a = 2; UPDATE 1 postgres[1]=# \cswitch 2 [2] You are now connected to database "postgres" postgres[2]=# select * from foo; a --- 1 (1 row) postgres[2]=# \cnowait postgres[2]=# update foo set a = 3; postgres[2]=# \cwait .1 postgres[2]=# \cswitch 1 [1] You are now connected to database "postgres" postgres[1]=# commit; COMMIT postgres[1]=# \cswitch 2 [2] You are now connected to database "postgres" UPDATE 1 postgres[2]=# \clist [1] Connected to database "postgres" [2] Connected to database "postgres" postgres[2]=# \cdisconnect Disconnecting from database (use \connect to reconnect or \cswitch to select another connection) !> \cswitch 1 [1] You are now connected to database "postgres" -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org