Hi all, I'm investigating the issue I reported here: https://www.postgresql.org/message-id/flat/153478795159.1302.9617586466368699403%40wrigleys.postgresql.org
As Tom Lane mentioned there, the docs (8.13) indicate xmloption = CONTENT should accept all valid XML. At this time, XML with a DOCTYPE declaration is not accepted with this setting even though it is considered valid XML. I'd like to work on a patch to address this issue and make it work as advertised. I traced the source of the error to line ~1500 in /src/backend/utils/adt/xml.c res_code = xmlParseBalancedChunkMemory(doc, NULL, NULL, 0, utf8string + count, NULL); It looks like it is xmlParseBalancedChunkMemory from libxml that doesn't work when there's a DOCTYPE in the XML data. My assumption is the DOCTYPE element makes the XML not well-balanced. From: http://xmlsoft.org/html/libxml-parser.html#xmlParseBalancedChunkMemory This function returns: > 0 if the chunk is well balanced, -1 in case of args problem and the parser > error code otherwise I see xmlParseBalancedChunkMemoryRecover that might provide the functionality needed. That function returns: 0 if the chunk is well balanced, -1 in case of args problem and the parser > error code otherwise In case recover is set to 1, the nodelist will not be > empty even if the parsed chunk is not well balanced, assuming the parsing > succeeded to some extent. I haven't tested yet to see if this parses the data w/ DOCTYPE successfully yet. If it does, I don't think it would be difficult to update the check on res_code to not fail. I'm making another assumption that there is a distinct code from libxml to differentiate from other errors, but I couldn't find those codes quickly. The current check is this: if (res_code != 0 || xmlerrcxt->err_occurred) Does this sound reasonable? Have I missed some major aspect? If this is on the right track I can work on creating a patch to move this forward. Thanks, *Ryan Lambert* RustProof Labs www.rustprooflabs.com