I want to progress work on stored procedures returning multiple result sets. Examples of how this could work on the SQL side have previously been shown [0]. We also have ongoing work to make psql show multiple result sets [1]. This appears to work fine in the simple query protocol. But the extended query protocol doesn't support multiple result sets at the moment [2]. This would be desirable to be able to use parameter binding, and also since one of the higher-level goals would be to support the use case of stored procedures returning multiple result sets via JDBC.

[0]: https://www.postgresql.org/message-id/flat/4580ff7b-d610-eaeb-e06f-4d686896b93b%402ndquadrant.com
[1]: https://commitfest.postgresql.org/29/2096/
[2]: https://www.postgresql.org/message-id/9507.1534370765%40sss.pgh.pa.us

(Terminology: I'm calling this project "dynamic result sets", which includes several concepts: 1) multiple result sets, 2) those result sets can have different structures, 3) the structure of the result sets is decided at run time, not declared in the schema/procedure definition/etc.)

One possibility I rejected was to invent a third query protocol beside the simple and extended one. This wouldn't really match with the requirements of JDBC and similar APIs because the APIs for sending queries don't indicate whether dynamic result sets are expected or required, you only indicate that later by how you process the result sets. So we really need to use the existing ways of sending off the queries. Also, avoiding a third query protocol is probably desirable in general to avoid extra code and APIs.

So here is my sketch on how this functionality could be woven into the extended query protocol. I'll go through how the existing protocol exchange works and then point out the additions that I have in mind.

These additions could be enabled by a _pq_ startup parameter sent by the client. Alternatively, it might also work without that because the client would just reject protocol messages it doesn't understand, but that's probably less desirable behavior.

So here is how it goes:

C: Parse
S: ParseComplete

At this point, the server would know whether the statement it has parsed can produce dynamic result sets. For a stored procedure, this would be declared with the procedure definition, so when the CALL statement is parsed, this can be noticed. I don't actually plan any other cases, but for the sake of discussion, perhaps some variant of EXPLAIN could also return multiple result sets, and that could also be detected from parsing the EXPLAIN invocation.

At this point a client would usually do

C: Describe (statement)
S: ParameterDescription
S: RowDescription

New would be that the server would now also respond with a new message, say,

S: DynamicResultInfo

that indicates that dynamic result sets will follow later. The message would otherwise be empty. (We could perhaps include the number of result sets, but this might not actually be useful, and perhaps it's better not to spent effort on counting things that don't need to be counted.)

(If we don't guard this by a _pq_ startup parameter from the client, an old client would now error out because of an unexpected protocol message.)

Now the normal bind and execute sequence follows:

C: Bind
S: BindComplete
(C: Describe (portal))
(S: RowDescription)
C: Execute
S: ... (DataRows)
S: CommandComplete

In the case of a CALL with output parameters, this "primary" result set contains one row with the output parameters (existing behavior).

Now, if the client has seen DynamicResultInfo earlier, it should now go into a new subsequence to get the remaining result sets, like this (naming obviously to be refined):

C: NextResult
S: NextResultReady
C: Describe (portal)
S: RowDescription
C: Execute
....
S: CommandComplete
C: NextResult
...
C: NextResult
S: NoNextResult
C: Sync
S: ReadyForQuery

I think this would all have to use the unnamed portal, but perhaps there could be other uses with named portals. Some details to be worked out.

One could perhaps also do without the DynamicResultInfo message and just put extra information into the CommandComplete message indicating "there are more result sets after this one".

(Following the model from the simple query protocol, CommandComplete really means one result set complete, not the whole top-level command. ReadyForQuery means the whole command is complete. This is perhaps debatable, and interesting questions could also arise when considering what should happen in the simple query protocol when a query string consists of multiple commands each returning multiple result sets. But it doesn't really seem sensible to cater to that.)

One thing that's missing in this sequence is a way to specify the desired output format (text/binary) for each result set. This could be added to the NextResult message, but at that point the client doesn't yet know the number of columns in the result set, so we could only do it globally. Then again, since the result sets are dynamic, it's less likely that a client would be coded to set per-column output codes. Then again, I would hate to bake such a restriction into the protocol, because some is going to try. (I suspect what would be more useful in practice is to designate output formats per data type.) So if we wanted to have this fully featured, it might have to look something like this:

C: NextResult
S: NextResultReady
C: Describe (dynamic) (new message subkind)
S: RowDescription
C: Bind (zero parameters, optionally format codes)
S: BindComplete
C: Describe (portal)
S: RowDescription
C: Execute
...

While this looks more complicated, client libraries could reuse existing code that starts processing with a Bind message and continues to CommandComplete, and then just loops back around.

The mapping of this to libpq in a simple case could look like this:

PQsendQueryParams(conn, "CALL ...", ...);
PQgetResult(...);  // gets output parameters
PQnextResult(...);  // new: sends NextResult+Bind
PQgetResult(...);  // and repeat

Again, it's not clear here how to declare the result column output formats. Since libpq doesn't appear to expose the Bind message separately, I'm not sure what to do here.

In JDBC, the NextResult message would correspond to the Statement.getMoreResults() method. It will need a bit of conceptual adjustment because the first result set sent on the protocol is actually the output parameters, which the JDBC API returns separately from a ResultSet, so the initial CallableStatement.execute() call will need to process the primary result set and then send NextResult and obtain the first dynamic result as the first ResultSet for its API, but that can be handled internally.

Thoughts so far?

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply via email to