On Fri, 9 Oct 2020 at 13:33, Andrew Dunstan <and...@dunslane.net> wrote:
> > On 10/8/20 3:46 AM, Peter Eisentraut wrote: > > I want to progress work on stored procedures returning multiple result > > sets. Examples of how this could work on the SQL side have previously > > been shown [0]. We also have ongoing work to make psql show multiple > > result sets [1]. This appears to work fine in the simple query > > protocol. But the extended query protocol doesn't support multiple > > result sets at the moment [2]. This would be desirable to be able to > > use parameter binding, and also since one of the higher-level goals > > would be to support the use case of stored procedures returning > > multiple result sets via JDBC. > > > > [0]: > > > https://www.postgresql.org/message-id/flat/4580ff7b-d610-eaeb-e06f-4d686896b93b%402ndquadrant.com > > [1]: https://commitfest.postgresql.org/29/2096/ > > [2]: > > https://www.postgresql.org/message-id/9507.1534370765%40sss.pgh.pa.us > > > > (Terminology: I'm calling this project "dynamic result sets", which > > includes several concepts: 1) multiple result sets, 2) those result > > sets can have different structures, 3) the structure of the result > > sets is decided at run time, not declared in the schema/procedure > > definition/etc.) > > > > One possibility I rejected was to invent a third query protocol beside > > the simple and extended one. This wouldn't really match with the > > requirements of JDBC and similar APIs because the APIs for sending > > queries don't indicate whether dynamic result sets are expected or > > required, you only indicate that later by how you process the result > > sets. So we really need to use the existing ways of sending off the > > queries. Also, avoiding a third query protocol is probably desirable > > in general to avoid extra code and APIs. > > > > So here is my sketch on how this functionality could be woven into the > > extended query protocol. I'll go through how the existing protocol > > exchange works and then point out the additions that I have in mind. > > > > These additions could be enabled by a _pq_ startup parameter sent by > > the client. Alternatively, it might also work without that because > > the client would just reject protocol messages it doesn't understand, > > but that's probably less desirable behavior. > > > > So here is how it goes: > > > > C: Parse > > S: ParseComplete > > > > At this point, the server would know whether the statement it has > > parsed can produce dynamic result sets. For a stored procedure, this > > would be declared with the procedure definition, so when the CALL > > statement is parsed, this can be noticed. I don't actually plan any > > other cases, but for the sake of discussion, perhaps some variant of > > EXPLAIN could also return multiple result sets, and that could also be > > detected from parsing the EXPLAIN invocation. > > > > At this point a client would usually do > > > > C: Describe (statement) > > S: ParameterDescription > > S: RowDescription > > > > New would be that the server would now also respond with a new > > message, say, > > > > S: DynamicResultInfo > > > > that indicates that dynamic result sets will follow later. The > > message would otherwise be empty. (We could perhaps include the > > number of result sets, but this might not actually be useful, and > > perhaps it's better not to spent effort on counting things that don't > > need to be counted.) > > > > (If we don't guard this by a _pq_ startup parameter from the client, > > an old client would now error out because of an unexpected protocol > > message.) > > > > Now the normal bind and execute sequence follows: > > > > C: Bind > > S: BindComplete > > (C: Describe (portal)) > > (S: RowDescription) > > C: Execute > > S: ... (DataRows) > > S: CommandComplete > > > > In the case of a CALL with output parameters, this "primary" result > > set contains one row with the output parameters (existing behavior). > > > > Now, if the client has seen DynamicResultInfo earlier, it should now > > go into a new subsequence to get the remaining result sets, like this > > (naming obviously to be refined): > > > > C: NextResult > > S: NextResultReady > > C: Describe (portal) > > S: RowDescription > > C: Execute > > .... > > S: CommandComplete > > C: NextResult > > ... > > C: NextResult > > S: NoNextResult > > C: Sync > > S: ReadyForQuery > > > > I think this would all have to use the unnamed portal, but perhaps > > there could be other uses with named portals. Some details to be > > worked out. > > > > One could perhaps also do without the DynamicResultInfo message and > > just put extra information into the CommandComplete message indicating > > "there are more result sets after this one". > > > > (Following the model from the simple query protocol, CommandComplete > > really means one result set complete, not the whole top-level command. > > ReadyForQuery means the whole command is complete. This is perhaps > > debatable, and interesting questions could also arise when considering > > what should happen in the simple query protocol when a query string > > consists of multiple commands each returning multiple result sets. > > But it doesn't really seem sensible to cater to that.) > > > > One thing that's missing in this sequence is a way to specify the > > desired output format (text/binary) for each result set. This could > > be added to the NextResult message, but at that point the client > > doesn't yet know the number of columns in the result set, so we could > > only do it globally. Then again, since the result sets are dynamic, > > it's less likely that a client would be coded to set per-column output > > codes. Then again, I would hate to bake such a restriction into the > > protocol, because some is going to try. (I suspect what would be more > > useful in practice is to designate output formats per data type.) So > > if we wanted to have this fully featured, it might have to look > > something like this: > > > > C: NextResult > > S: NextResultReady > > C: Describe (dynamic) (new message subkind) > > S: RowDescription > > C: Bind (zero parameters, optionally format codes) > > S: BindComplete > > C: Describe (portal) > > S: RowDescription > > C: Execute > > ... > > > > While this looks more complicated, client libraries could reuse > > existing code that starts processing with a Bind message and continues > > to CommandComplete, and then just loops back around. > > > > The mapping of this to libpq in a simple case could look like this: > > > > PQsendQueryParams(conn, "CALL ...", ...); > > PQgetResult(...); // gets output parameters > > PQnextResult(...); // new: sends NextResult+Bind > > PQgetResult(...); // and repeat > > > > Again, it's not clear here how to declare the result column output > > formats. Since libpq doesn't appear to expose the Bind message > > separately, I'm not sure what to do here. > > > > In JDBC, the NextResult message would correspond to the > > Statement.getMoreResults() method. It will need a bit of conceptual > > adjustment because the first result set sent on the protocol is > > actually the output parameters, which the JDBC API returns separately > > from a ResultSet, so the initial CallableStatement.execute() call will > > need to process the primary result set and then send NextResult and > > obtain the first dynamic result as the first ResultSet for its API, > > but that can be handled internally. > > > > Thoughts so far? > > > > > Exciting stuff. But I'm a bit concerned about the sequence of > resultsets. The JDBC docco for CallableStatement says: > > A CallableStatement can return one ResultSet object or multiple > ResultSet objects. Multiple ResultSet objects are handled using > operations inherited from Statement. > > For maximum portability, a call's ResultSet objects and update > counts should be processed prior to getting the values of output > parameters. > > And this is more or less in line with the pattern that I've seen when > converting SPs from other systems - the OUT params are usually set at > the end with things like status flags and error messages. > > If the OUT parameter resultset has to come first (which is how I read > your proposal - please correct me if I'm wrong) we'll have to stack up > all the resultsets until the SP returns, then send the OUT params, then > send the remaining resultsets. That seems ... suboptimal. The > alternative would be to send the OUT params last. That might result in > the driver needing to do some lookahead and caching, but I don't think > it's unmanageable. Of course, your protocol would also need changing. > > > cheers > > > andrew > > > -- > Andrew Dunstan > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > > Currently the JDBC driver does NOT do : At this point a client would usually do > > C: Describe (statement) > S: ParameterDescription > S: RowDescription We do not do the Describe until we use a named statement and decide that the extra round trip is worth it. Making this assumption will cause a performance regression on all queries. If we are going to make a protocol change there are a number of other things the drivers want. https://github.com/pgjdbc/pgjdbc/blob/master/backend_protocol_v4_wanted_features.md Thanks, Dave