First a general comment: > Then the driver writers that need these special API behaviors are > reasonably expected to contribute to adding them to backend products that > do not already have them. The database developers are not going to take on > responsibility for the API decisions of others; and features deemed (or > that in reality are) of marginal usefulness are likely to be omitted - > intentionally or otherwise - from the official (in this case libpq) > protocol.
I absolutely agree with you there, I'm not trying to get anybody to implement something I need (i.e. fetch 0 rows). This is more of a general discussion as to whether that feature *makes sense* to you as a protocol feature (which doesn't seem to be the case, as some of you guys want to deprecate the whole max_rows thing). >> I'll clarify just a little... I am indeed talking about the PostgreSQL >> network protocol, and not about query optimization (with LIMIT or omitting >> RETURNING etc.). I am implementing ADO.NET's ExecuteNonQuery >> through which the user indicates they're not interested in any result rows >> whether those exist or not. > ExecuteNonQuery returns an integer while row-returning queries do not. > I'd argue that the API states that the user is declaring that the query > they are executing does not return any actual rows - just a count of > affected rows - not that they do not care to see what rows are returned. That's true. IMHO the count of affected rows isn't relevant to this discussion so I didn't mention it. >> For the situation where a user does ExecuteNonQuery but the query returns >> result rows, the driver can save the needless network transfers. We can >> definitely say it's the user's fault for providing a query with a resultset >> to ExecuteNonQuery, but we *do* have the user's clear intention that no >> rows be fetched so why not act on it. I agree this isn't a terribly >> important optimization, the trigger for this question was first and >> foremost curiosity: it seems strange the protocol allows you to specify >> max_rows for any value other than 0. > Yes, it does seem strange and, like Marko said, ideally would be > deprecated. The fact that it cannot handle "zero rows" seems like an > unnecessary limitation and I cannot image that any values other than 0 and > all would be of practical usage. In the case of zero returning instead the > number of rows would be more useful than simply refusing to return anything > so even if something like this is needed the current implementation is > flawed. Just to be precise: what is strange to me is that the max_rows feature exists but has no 0 value. You and Marko are arguing that the whole feature should be deprecated (i.e. always return all rows). >> Here's a possible believable use-case which doesn't involve user neglect: >> imagine some server-side function which has side-effects and also returns >> some rows. In some situations the user is interested in the result rows, >> but in others they only want the side-effect. The user would probably have >> no control over the function, and their only way to *not* transfer the >> result rows would be with a mechanism such as max_rows. > Functions always return rows and so should not be executed using > "ExecuteNonQuery". In most cases action-oriented functions return a single > result-status row so ignoring that row, while likely not advisable, is not > exactly expensive. Your description of functions doesn't hold for all functions, this is why I tried to provide a usecase. It is possible for some function to both have a side-effect (i.e. modify some table) *and* return a large number of rows. It may be legitimate for a user to want to have the side-effect but not care about the rows. Ignoring one row isn't expensive, ignoring many could be. > The basic question here becomes - the executor already must generate, in > memory, all of the rows so is there a way to properly interact with the > server where you can request the number of rows that were generated but not > be obligated to actually pull them down to the client. This doesn't seem > like an unreasonable request but assuming that it is not currently possible > (of which I have little clue) then the question becomes who cares enough to > design and implement such a protocol enhancement. OK. >> More to the point, doesn't max_rows=1 have exactly the same dangers as >> LIMIT 1? The two seem to be identical, except that one is expressed in the >> SQL query and the other at the network protocol level. > The planner does not have access to network protocol level? options while > it does know about LIMIT. That's an internal PostgreSQL matter (which granted, may impact efficiency). My comment about max_rows being equivalent to LIMIT was meant to address Marko's argument that max_rows is dangerous because any row might come out and tests may pass accidentally (but that holds for LIMIT 1 as well, doesn't it). > Expecting users to use an API without knowledge or control of the SQL that > is being executed seems like a stretch to me. Expecting the driver to > simply provide an easy way to access data from the common SQL idioms a user > might use seems like a reasonable goal and puts leaves the smarts in the > purvue of the planner. The optimization you require doesn't seem > unreasonable but also doesn't seem especially compelling - nor matter how > many people might be using ADO.NET (which provides no indication that they > are trying to use APIs that are incompatible with the queries that they are > sending. Fair enough, I don't disagree with the above. The idea is less "no knowledge or control of the SQL", but rather a complicated prepared statement that is executed in some places to fetch all rows, and in others to fetch only 1 (maybe this is the most compelling usecase).