On Mon, Jun 20, 2011 at 09:15:51PM +0200, Florian Pflug wrote: > On Jun20, 2011, at 19:57 , Noah Misch wrote: > > I tested this patch using two systems, a CentOS 5 box with > > libxml2-2.6.26-2.1.2.8.el5_5.1 and an Ubuntu 8.04 box with libxml2 > > 2.6.31.dfsg-2ubuntu1.6. Both failed to build with this error: > > > > xml.c: In function `pg_xml_init': > > xml.c:934: error: `xmlStructuredErrorContext' undeclared (first use in this > > function) > > xml.c:934: error: (Each undeclared identifier is reported only once > > xml.c:934: error: for each function it appears in.) > > make[4]: *** [xml.o] Error 1 > > > > It seems `xmlStructuredErrorContext' was added rather recently. It's not > > obvious which version added it, but 2.7.8 has it and 2.7.2 does not. > > Looking > > at 2.6.26's sources, that version seems to use `xmlGenericErrorContext' for > > both structured and generic handler functions. Based on that, I tried with > > a > > change from `xmlStructuredErrorContext' to `xmlGenericErrorContext' in our > > xml.c. I ran the test suite and hit some failures in the xml test; see > > attached regression.diffs. I received warnings where the tests expected > > nothing or expected errors. > > Great :-(. I wonder if maybe I should simply remove the part of the patch > which try to restore the error handler and context in pg_xml_done(). I've > added that because I feared that some 3rd-party libraries setup their libxml > error handle just once, not before every library class. The code previously > didn't care about this either, but since we're using structured instead of > generic error reporting now, we'd now collide with 3-rd party libs which > also use structured error handling, whereas previously that would have worked. > OTOH, once there's more than one such library... > > Whats your stand on this?
Assuming it's sufficient to save/restore xmlStructuredErrorContext when it's available and xmlGenericErrorContext otherwise, I would just add an Autoconf check to identify which one to use. > > I couldn't reconcile this description with the code. I do see that > > xpath_internal() uses XML_PURPOSE_OTHER, but so does xmlelement(). Why is > > xmlelement() also ready for the strictest error reporting? > > xmlelement() doesn't use libxml's parser, only the xml writer. I didn't > want to suppress any errors the writer might raise (I'm not sure it raises > error at all, though). Makes sense. > > Also note that we may be able to be more strict when xmloption = document. > > Yeah, probably. I think that better done in a separate patch, though - I've > tried not to change existing behaviour with this patch except where absolutely > necessary (i.e. for XPATH) Likewise. > > Ideally, an invalid namespace URI should always be an error. While > > namespace > > prefixes could become valid as you assemble a larger document, a namespace > > URI's validity is context-free. > > I agree. That, however, would require xmlelement() to check all xmlns* > attributes, and I didn't find a way to do that with libxml() (Well, > other than to parse the element after creating it, but that's a huge > kludge). So I left that issue for a later time... Likewise. > > Remembering to put a pg_xml_done() in every relevant elog(ERROR, ...) path > > seems fragile. How about using RegisterXactCallback/RegisterSubXactCallback > > in pgxml_parser_init() to handle those cases? You can also use it to Assert > > at non-abort transaction shutdown that pg_xml_done() was called as needed. > > Hm, isn't that a bit fragile too? It seems that PG_TRY/PG_CATCH blocks don't > necessarily create sub-transactions, do they? PG_TRY never creates a subtransaction. However, elog(ERROR, ...) aborts either the subtransaction, or if none, the top-level transaction. Therefore, registering the same callback with both RegisterXactCallback and RegisterSubXactCallback ensures that it gets called for any elog/ereport non-local exit. Then you only explicitly call pg_xml_done() in the non-error path. However, I see now that in the core xml.c, every error-condition pg_xml_done() appears in a PG_CATCH block. That's plenty reliable ... > Also, most elog() paths already contained xmlFree() calls, because libxml > seemingly cannot be made to use context-based memory management. So one > already needs to be pretty careful when touching these... ... and this is a good reason to keep doing it that way. So, scratch that idea. > > Consider renaming XML_REPORT_ERROR it to something like XML_CHECK_ERROR, > > emphasizing that it wraps a conditional. > > Hm, I think it was named XML_CHECK_ERROR at one point, and I changed it > because I felt it didn't convey the fact that it's actually ereport() > and not just return false on error. But I agree that XML_REPORT_ERROR isn't > very self-explanatory, either. What about XML_REPORT_PENDING_ERROR() > or XML_CHECK_AND_REPORT_ERROR()? XML_CHECK_AND_REPORT_ERROR() seems fine. Or XML_CHECK_AND_EREPORT()? > > On Thu, Jun 09, 2011 at 06:11:46PM +0200, Florian Pflug wrote: > >> *************** xml_parse(text *data, XmlOptionType xmlo > >> *** 1175,1182 **** > >> int32 len; > >> xmlChar *string; > >> xmlChar *utf8string; > >> ! xmlParserCtxtPtr ctxt; > >> ! xmlDocPtr doc; > >> > >> len = VARSIZE(data) - VARHDRSZ; /* will be useful later */ > >> string = xml_text2xmlChar(data); > >> --- 1244,1251 ---- > >> int32 len; > >> xmlChar *string; > >> xmlChar *utf8string; > >> ! xmlParserCtxtPtr ctxt = NULL; > >> ! xmlDocPtr doc = NULL; > >> > >> len = VARSIZE(data) - VARHDRSZ; /* will be useful later */ > >> string = xml_text2xmlChar(data); > > > > Is this change needed? > > For "doc" it definitely is. Since we can no report an error even if > the parser returned a valid doc, there's now a "if (doc) xmlFree(doc)" > in the PG_CATCH patch. > > For "ctxt", strictly, speaking no. But only because "ctxt = > xmlNewParserCtxt()" > is the first line in the PG_TRY block, and therefore made the "xmlFree(ctxt)" > unconditional. But doing it this way seems more consistent and less > error-prone. I see it now; thanks. > > Is there something about this patch's other changes that make it necessary > > to > > strip multiple newlines, vs. one newline as we did before, and to do it in a > > different place in the code? > > There previously wasn't a separate function for that. I made it strip off > multiple newlines because that allowed me to keep > appendStringInfoLineSeparator() > simple while still making sure that the end result is exactly one newline > at the end of the string. Previously, the coded depended on libxml always > adding a newline at the end of messages, which I'm not even sure it true > for messages reported via the structured error handler. Fair enough. > > It's odd to have a purpose of "NONE" that pg_xml_init() callers can actually > > use. How about XML_PURPOSE_TRIVIAL, for callers that only call libxml2 > > functions that can't error? Also, I think it would be better to require a > > call to pg_xml_done() in all cases, just to keep things simple. Actually, I > > wonder if it's worth having XML_PURPOSE_TRIVIAL for the one user thereof. > > It > > doesn't save much. > > Yeah, that name really is a perfect example of horrible naming ;-) > > The pg_xml_done()-not-required semantics are necessary though. The problem is > that parse_xml_decl() is sometimes called after the caller has already called > pg_xml_init() himself. Not always though, so one cannot simply remove the > pg_xml_init() call from parse_xml_decl() - it might then use libxml without > it being initialized. So I made pg_xml_init() skip the error-handler setup > when purpose == XML_PURPOSE_NONE. This requires then, however, that > pg_xml_done() > is *not* called, because it'd restore settings that were never saved (well, > actually it'd tripe an assert that protects against that...). > > But yeah, this is all quite contorted. Maybe pg_xml_init() should simply be > split into two functions, one which initialized the library and one which > sets up error handling. That means that two calls instead of one are required, > but it makes the purpose of the individual functions much clearer... Oh, tricky. That's an option, and the error-handler-initialization function could just call the more-primitive one, so most callers would not need to know. Another way is to make xml_error_initialized a counter. If it's called a second time, increment and do nothing more. Only unregister the handler when it's decremented to zero. No strong preference here. > >> ! XML_PURPOSE_LEGACY /* Save error message only, don't set error flag */, > > > > It's fine to set the flag for legacy users, considering they could just not > > read it. The important distinction seems to be that we don't emit warnings > > or > > notices in this case. Is that correct? If so, the comment should reflect > > that emphasis. Then, consider updating the flag unconditionally. > > I took me a while to remember while I did it that way, but I think I have now. > > I initialled wanted to add an Assert(!xml_error_occurred) to catch any > missing XML_REPORT_ERROR() calls. I seems to have forgotten to do that, > though... > > So I guess I should either refrain from setting the flag as you suggested, > or add such an Assert(). I think I very slightly prefer the latter, what > do you think? I do like the idea of that assert. How about setting the flag anyway, but making it "Assert(xml_purpose == XML_PURPOSE_LEGACY || !xml_error_occurred)"? > >> *** a/src/test/regress/expected/xml.out > >> --- b/src/test/regress/expected/xml.out > >> *************** INSERT INTO xmltest VALUES (3, '<wrong') > >> *** 8,14 **** > >> ERROR: invalid XML content > >> LINE 1: INSERT INTO xmltest VALUES (3, '<wrong'); > >> ^ > >> ! DETAIL: Entity: line 1: parser error : Couldn't find end of Start Tag > >> wrong line 1 > >> <wrong > >> ^ > >> SELECT * FROM xmltest; > >> --- 8,14 ---- > >> ERROR: invalid XML content > >> LINE 1: INSERT INTO xmltest VALUES (3, '<wrong'); > >> ^ > >> ! DETAIL: line 1: Couldn't find end of Start Tag wrong line 1 > > > > Any reason to change the error text this way? > > The "Entity:" prefix is added by libxml only for messages reported > to the generic error handler. It *doesn't* refer to entities as defined > in xml, but instead used in place of the file name if libxml > doesn't have that at hand (because it's parsing from memory). > > So that "Entity:" prefix really doesn't have any meaning whatsoever. So xmlParserPrintFileContext() sends different content to the generic error handler from what "natural" calls to the handler would send? > I believe that the compatibility risk is pretty low here, and removing > the meaningless prefix makes the error message are bit nicer to read. > But if you are concerned that there maybe applications out there who > parse the error text, we could of course add it back. I must say that > I don't really know what our policy regarding error message stability is... If the libxml2 API makes it a major pain to preserve exact messages, I wouldn't worry. Thanks, nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers