Thank you for picking up. > is considering three cases: it got a 2-byte integer (and can continue on), > or there aren't yet 2 more bytes available in the buffer, in which case it > should return EOF without doing anything, or pqGetInt detected a hard > error and updated the connection error state accordingly, in which case > again there is nothing to do except return EOF. In the patched code we > have: ... > which handles neither the second nor third case correctly: it thinks that > "data not here yet" is a hard error, and then makes sure it is an error by > destroying the parsing state :-(.
Marko and I think that, in protocol 3, all bytes of the incoming message should have been surely loaded when entering getAnotherTuple(). The following part In pqParseInput3() does this. | if (avail < msgLength) | { | /* | * Before returning, enlarge the input buffer if needed to hold | * the whole message. | (snipped).. | */ | if (pqCheckInBufferSpace(conn->inCursor + (size_t) msgLength, | conn)) | { | /* | * XXX add some better recovery code... | (snipped).. | */ | handleSyncLoss(conn, id, msgLength); | } | return; So, if cursor state is broken just after exiting getAnotherTuple(), it had already been broken BEFORE entering getAnotherTuple() according to current disign. That is the 'protocol error' means. pqGetInt there should not detect any errors except for broken message. > error, that possibly-useful error message is overwritten with an entirely > useless "protocol error" text. Plus, current pqGetInt seems to set its own error message only for the wrong parameter 'bytes'. On the other hand, in protocol 2 (to be removed ?) the error handling mechanism get touched, because full-load of the message is not guraranteed. > I don't think the error return cases for the row processor have been > thought out too well either. The row processor is not in charge of what > happens to the PGresult, Default row processor stuffs PGresult with tuples, another (say that of dblink) leave it empty. Row processor manages PGresult by the extent of their own. > and it certainly has no business telling libpq to just "exit > immediately from the topmost libpq function". If we do that > we'll probably lose sync with the data stream and be unable to > recover use of the connection at all. I don't think PGresult has any charge of error handling system in current implement. The phrase 'exit immediately from the topmost libpq function' should not be able to be seen in the patch. The exit routes from row processor are following, - Do longjmp (or PG_PG_TRY-CATCH mechanism) out of the row processor. - Row processor returns 0 when entered from PQisBusy(), immediately exit from PQisBusy(). Curosor consistency will be kept in both case. The cursor already be on the next to the last byte of the current message. > Also, do we need to consider any error cases for the row > processor other than out-of-memory? If so it might be a good > idea for it to have some ability to store a custom error > message into the PGconn, which it cannot do given the current > function API. It seems not have so strong necessity concerning dblink or PQgetRow comparing to expected additional complexity around. So this patch does not include it. > In the same vein, I am fairly uncomfortable with the blithe assertion that > a row processor can safely longjmp out of libpq. This was never foreseen > in the original library coding and there are any number of places that > that might break, now or in the future. Do we really need to allow it? To protect row processor from longjmp'ing out, I enclosed the functions potentially throw exception by PG_TRY-CATCH clause in the early verson. This was totally safe but the penalty was not negligible because the TRY-CATCH was passed for every row. > If we do, it would be a good idea to decorate the libpq > functions that are now expected to possibly longjmp with > comments saying so. Tracing all the potential call paths that > might be aborted by a longjmp is an essential activity anyway. Concerning now but the future, I can show you the trail of confirmation process. - There is no difference between with and without the patch at the level of getAnotherTuple() from the view of consistency. - Assuming pqParseInput3 detects the next message has not come after getAnotherTuple returned. It exits immediately on reading the length of the next message. This is the same condition to longjumping. > if (pqGetInt(&msgLength, 4, conn)) > return; - parseInput passes it through and immediately exits in consistent state. - The caller of PQgetResult, PQisBusy, PQskipResult, PQnotifies, PQputCopyData, pqHandleSendFailure gain the control finally. I am convinced that the async status at the time must be PGASYNC_BUSY and the conn cursor in consistent state. So the ancestor of row processor is encouraged to call PQfinish, PQgetResult, PQskipResult after getting longjmped in the document. These functions should resolve the intermediate status described above created by longjmp by restarting parsing the stream afterwards And about the future, altough it is a matter of cource that every touch on the code will cause every destruction, longjmp stepping over libpq internal code seems something complicated which is enough to cause trouble. I will marking as 'This function is skipped over by the longjmp invoked in the descendents.' (Better expressions are welcome..) > Another design deficiency is PQskipResult(). This is badly > designed for async operation because once you call it, it will > absolutely not give back control until it's read the whole > query result. (It'd be better to have it set some kind of flag > that makes future library calls discard data until the query > result end is reached.) If this function is called just after getting longjmp from row processor, the async status of the connection at the time must be PGASYNC_BUSY. So I think this function should always returns even if the longjmp takes place at the last row in a result. There must be following 'C' message if not broken. > Something else that's not very well-designed about it is that > it might throw away more than just incoming tuples. As an > example, suppose that the incoming data at the time you call it > consists of half a dozen rows followed by an ErrorResponse. > The ErrorResponse will be eaten and discarded, leaving the > application no clue why its transaction has been aborted, or > perhaps even the entire session cancelled. If the caller needs to see the ErrorResposes following, I think calling PQgetResult seems enough. > What we probably want here is just to transiently install a > row processor that > discards all incoming data, but the application is still > expected to eventually fetch a PGresult that will tell it > whether the server side of the query completed or not. The all-dicarding row processor should not be visible to the user. The users could design the row processor to behave so if they want. This is mere a shortcut but it seems difficult to do so without. > In the dblink patch, ... it's not clear what is the benefit of > using malloc rather than palloc to manage the intermediate > buffers in storeHandler --- what that seems to mostly > accomplish is increase the risk of permanent memory leaks. Hmm. I thought that palloc is heavier than malloc, and the allocated blocks looked well controlled so that there won't be likely to leak. I will change to use palloc's counting the discussion about block granurality below. > I don't see much value in per-column buffers either; it'd > likely be cheaper to just palloc one workspace that's big > enough for all the column strings together. I had hated to count the total length prior to copying the contents in the early version. In the latest the total requirement of the memory is easily obatined. So it seems good to alloc together. I'll do so. > And speaking of leaks, doesn't storeHandler leak the > constructed tuple on each call, not to mention whatever might > be leaked by the called datatype input functions? I copied the process in the original materializeResult into storeHandler. tuplestore_donestoring was removed because it is obsoleted. So I think it is no problem about it. Am I wrong? > It also seems to me that the dblink patch breaks the case > formerly handled in materializeResult() of a PGRES_COMMAND_OK > rather than PGRES_TUPLES_OK result. The COMMAND case is > supposed to be converted to a single-column text result, and I > sure don't see where that's happening now. I'm sorry. I will restore that route. > BTW, am I right in thinking that some of the hunks in this > patch are meant to fix a pre-existing bug of failing to report > the correct connection name? If so, it'd likely be better to > split those out and submit as a separate patch, instead of > conflating them with a feature addition. Ok. I will split it. > Lastly, as to the pqgetrow patch, the lack of any demonstrated > test case for these functions makes me uncomfortable as to > whether they are well designed. Again, I'm unconvinced that > the error handling is good or that they work sanely in async > mode. I'm inclined to just drop these for this go-round, and > to stick to providing the features that we can test via the > dblink patch. Testing pqGetRow via dblink? Do you mean 'drop these' as pqGetRow? So, this part might be droppable apart from the rest. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers