Re: [HACKERS] Going for "all green" buildfarm results

stark Thu, 17 Aug 2006 10:15:53 -0700

> Alvaro Herrera <alvherre ( at ) commandprompt ( dot ) com> writes:
>> Maybe we could write a suitable test case using Martijn's concurrent
>> testing framework.
> 
> The trick is to get process A to commit between the times that process B
> looks at the new and old versions of the pg_class row (and it has to
> happen to do so in that order ... although that's not a bad bet given
> the way btree handles equal keys).
> 
> I think the reason we've not tracked this down before is that that's a
> pretty small window.  You could force the problem by stopping process B
> with a debugger breakpoint and then letting A do its thing, but short of
> something like that you'll never reproduce it with high probability.


Actually I was already looking into a related issue and have some work here
that may help with this.

I wanted to test the online index build and to do that I figured you needed to
have regression tests like the ones we have now except with multiple database
sessions. So I hacked psql to issue queries asynchronously and allow multiple
database connections. That way you can switch connections while a blocked or
slow transaction is still running and issue queries in other transactions.

I thought it was a proof-of-concept kludge but actually it's worked out quite
well. There were a few conceptual gotchas but I think I have a reasonable
solution for each.

The main issue was that any time you issue an asynchronously connection that
you expect to block you have a race condition in the test. You can't switch
connections and proceed right away or you may actually proceed with the other
connection before the first connection's command is received and acted on by
the backend.

The "right" solution to this would involve altering the backend and the
protocol to provide some form of feedback when an asynchronous query had
reached various states including when it was blocked. You would have to
annotate it with enough information that the client can determine it's
actually blocked on the right thing and not just on some uninteresting
transient lock too.

Instead I just added a command to cause psql to wait for a time. This is
nearly as good since all the regression tests run fairly quickly so if you
wait even a fraction of a second you can be pretty certain the command has
been received and if it were not going to block it would have finished and
printed output already. And it was *much* simpler.

Also, I think for interactive use we would want a somewhat more sophisticated
scheduling of output. It would be nice to print out results as they come in
even if we're on another connection. For the regression tests you certainly do
not want that since that would introduce unavoidable non-deterministic race
conditions in your output files all over the place. The way I've coded it now
takes care to print out output only from the "active" database connection and
the test cases need to be written to switch connections at each point they
want to test for possibly incorrect output.

Another issue was that I couldn't come up with a nice set of names for the
commands that didn't conflict with the myriad of one-letter commands already
in psql. So I just prefixed the all with "c" (connection). I figured when I
submitted it I would just let the community hash out the names and take the 2s
it would take to change them.

The test cases are actually super easy to write and read, at least considering
we're talking about concurrent sql sessions here. I think it's far clearer
than trying to handle separate scripts and nearly as clear as Martin's
proposal from a while back to prepend a connection number on every line.

The commands I've added or altered are:

  \c[onnect][&] [DBNAME|- USER|- HOST|- PORT|-]
                connect to new database (currently "postgres")
                if optional & is present open do not close existing connection
  \cswitch n
                switch to database connection n
  \clist
                list database connections
  \cdisconnect
                close current database connection
                use \cswitch or \connect to select another connection
  \cnowait
                issue next query without waiting for results
  \cwait [n]
                if any queries are pending wait n seconds for results

Also I added %& to the psql prompt format to indicate the current connection.

So the tests look like, for example:

postgres=# \c&
[2] You are now connected to database "postgres".
postgres[2]=# begin;
BEGIN
postgres[2]=# create table foo (a integer);
CREATE TABLE
postgres[2]=# \cswitch 1
[1] You are now connected to database "postgres"
postgres[1]=# select * from foo;
ERROR:  relation "foo" does not exist
postgres[1]=# \cswitch 2
[2] You are now connected to database "postgres"
postgres[2]=# commit;
COMMIT
postgres[2]=# \cswitch 1
[1] You are now connected to database "postgres"
postgres[1]=# select * from foo;
 a 
---
(0 rows)

postgres[1]=# insert into foo values (1);
INSERT 0 1
postgres[1]=# begin;
BEGIN
postgres[1]=# update foo set a = 2;
UPDATE 1
postgres[1]=# \cswitch 2
[2] You are now connected to database "postgres"
postgres[2]=# select * from foo;
 a 
---
 1
(1 row)

postgres[2]=# \cnowait
postgres[2]=# update foo set a = 3;
postgres[2]=# \cwait .1
postgres[2]=# \cswitch 1
[1] You are now connected to database "postgres"
postgres[1]=# commit;
COMMIT
postgres[1]=# \cswitch 2
[2] You are now connected to database "postgres"
UPDATE 1
postgres[2]=# \clist
[1] Connected to database "postgres"
[2] Connected to database "postgres"
postgres[2]=# \cdisconnect
Disconnecting from database (use \connect to reconnect or \cswitch to select 
another connection)
!> \cswitch 1
[1] You are now connected to database "postgres"


-- 
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com


---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

               http://archives.postgresql.org

Re: [HACKERS] Going for "all green" buildfarm results

Reply via email to