Erik Wienhold <e...@ewie.name> writes: > On 2024-07-06 20:43 +0200, Tom Lane wrote: >> I'm disinclined to spend much effort on working around dot-zero bugs.
> Found an open issue about ABI compatibility that affects 2.12.7 and > possibly also 2.13: https://gitlab.gnome.org/GNOME/libxml2/-/issues/751. > Maybe just wait this one out, in case they'll bump SONAME. Whether they bump SONAME or put back the removed functions, there's no way that production-oriented distros are going to ship 2.13.x until something's done about that. That's good news for us, in that it gives us an excuse to not work around any dot-zero issues that get fixed before that point. But we'd better analyze our concerns and make sure anything we don't want to work around gets fixed by then. > I tried xmlCtxtSetErrorHandler but xmlParseBalancedChunkMemory doesn't > report to that. It's a known issue: > https://gitlab.gnome.org/GNOME/libxml2/-/issues/727 I saw that one. It would be good to have a replacement for xmlParseBalancedChunkMemory, because after looking at the libxml2 sources I realize that that's classed as a SAX1 function, which means it will likely go away at some point (maybe it's already not there in some builds). That's a long-term consideration though. In the short term, I dug in the libxml2 sources and found that xmlParseBalancedChunkMemory got pretty heavily refactored between 2.12 and 2.13. The reason for our immediate problem is that the return value in the old code is defined by if (!ctxt->wellFormed) { if (ctxt->errNo == 0) ret = 1; else ret = ctxt->errNo; } else { ret = 0; } whereas the new code just returns ctxt->errNo unconditionally. This does not agree with the phraseology in libxml2's documentation: 0 if the chunk is well balanced, -1 in case of args problem and the parser error code otherwise so I filed an issue at https://gitlab.gnome.org/GNOME/libxml2/-/issues/765 We'll see what they say about that. I think we could work around it as attached. This relies on seeing that the 2.13 code will return a node list if and only if ctxt->wellFormed is true (and we already eliminated the empty-input case, so an empty node list shouldn't happen). But it's not a lot less ugly than your proposal. Also, this only fixes the two wrong-output-from-xmlserialize test cases. I'm not so stressed about the cases where the errdetail changes, but I think we need to find an answer for the places where it fails and didn't before, like: SELECT xmlparse(content '<invalidns xmlns=''<''/>'); - xmlparse ---------------------------- - <invalidns xmlns='<'/> -(1 row) - +ERROR: invalid XML content regards, tom lane
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c index d75f765de0..d5fa83e736 100644 --- a/src/backend/utils/adt/xml.c +++ b/src/backend/utils/adt/xml.c @@ -1840,6 +1840,18 @@ xml_parse(text *data, XmlOptionType xmloption_arg, res_code = xmlParseBalancedChunkMemory(doc, NULL, NULL, 0, utf8string + count, parsed_nodes); + + /* + * libxml2 2.13.x incorrectly returns parser errors even if + * the chunk is well-balanced. As a workaround, ignore the + * return code if a node list was returned. + * https://gitlab.gnome.org/GNOME/libxml2/-/issues/765 + */ +#if LIBXML_VERSION >= 21300 + if (res_code > 0 && parsed_nodes != NULL) + res_code = 0; +#endif + if (res_code != 0 || xmlerrcxt->err_occurred) { xml_errsave(escontext, xmlerrcxt,