There might be too many different email threads on this with patches, but in case it went under the radar, xml-content-2006-3.patch appeared in my previous message on this thread[1].
It is based on a simple pre-check of the prefix of the input, determining which form of parse to apply. That may or may not be simpler than parse- once-save-error-parse-again-report-first-error, but IMV it's a more direct solution and clearer (the logic is clearly about "how do I determine the way this input should be parsed?" which is the problem on the table, rather than "how should I save and regurgitate this libxml error?" which turns the problem on the table to a different one). I decided, for a first point of reference, to wear the green eyeshade and write a pre-check that exactly implements the applicable rules. That gives a starting point for simplifications that are probably safe. For example, a bunch of lines at the end have to do with verifying the content inside of a processing-instruction, after finding where it ends. We could reasonably decide that, for the purpose of skipping it, knowing where it ends is enough, as libxml will parse it next and report any errors anyway. That would slightly violate my intention of sending input to (the parser that wasn't asked for) /only/ when it's completely clear (from the prefix we've seen) that that's where it should go. The relaxed version could do that in completely-clear cases and cases with an invalid PI ahead of what looks like a DTD. But you'd pretty much expect both parsers to produce the same message for a bad PI anyway. That made me just want to try it now, and--surprise!--the messages from libxml are not the same. So maybe I would lean to keeping the green-eyeshade rules in the test, if you can stomach them, but I would understand taking them out. Regards, -Chap [1] https://www.postgresql.org/message-id/5c8ecaa4.3090...@anastigmatix.net