On Jun2, 2011, at 01:34 , Florian Pflug wrote: > On Jun2, 2011, at 00:02 , Noah Misch wrote: >> On Wed, Jun 01, 2011 at 06:16:21PM +0200, Florian Pflug wrote: >>> Anyway, I'll try to come up with a patch that replaces >>> xmlSetGenericErrorFunc() with xmlSetStructuredErrorFunc(). >> >> Sounds sensible. Will this impose any new libxml2 version dependency? > > xmlSetStructuredErrorFunc() seems to be available starting with libxml 2.6.0, > release on Oct 20, 2003. Since we already require the version to be >= 2.6.23, > we should be OK. > > I won't have access to my PC the next few days, but I'll try to come up with > a patch some time next week.
Phew... I did manage to produce a patch, but it was way more work than I had intended to put into this. As it turns out, you loose the nicely formatted context information that libxml2 provides via the generic error func once you switch to structured error reporting. Registering handlers for both doesn't help either, since the generic error handler isn't called once you register a structured one. Fortunately, libxml does export xmlParserPrintFileContext() which generates these context messages. It, however, doesn't return a string, but instead passes them to the generic error handler (this time, independent from whether a structural error handler is registered or not). As it stood, the code assumed that all third-party library re-install their libxml error handlers before each library call, and thus didn't bother to restore the old error handler itself. Since I revamped the error handling anyway, I removed that requirement. There is now a function pg_xml_done() which restores the original error handler that we overwrote in pg_xml_init(). I also realized that some libxml error (like undefined namespace prefixes) must be ignored during xmlparse() and friends. Otherwise, it becomes impossible to build XML documents from individual fragments. pg_xml_init() therefore now takes an argument which specifies how strict the error checking is supposed to be. For the moment, only XPATH() uses the strict mode in which we report all errors. XMLPARSE() and friends only report parse errors, not namespace errors. Finally, I had to adjust contrib/xml2 because it uses some parts of the core XML support like pg_xml_init(). Heres the indended behaviour with the patch applied: ---------------------------------------------------- We always use structured error handling. For now, the error messages pretty much resemble the old ones, but it's now easy to add additional information. XMLPARSE() and casting to XML check for parse errors only, like they do without the patch. They're also capable of reporting warnings, but I didn't find a case where the libxml parser generates a warning. XPATH() reports all errors and warnings. Trying to use XPATH() on a document with e.g. inconsistent namespace usage or invalid namespace URIs therefore now raises an error. This is *necessary* because libxml's XPath evaluator gets confused if it encounters e.g. invalid namespace URI and outputs invalid XML in response. contrib/xml2's behaviour hasn't changed. Patch is attached, and comments are welcome. best regards, Florian Pflug
pg_xml_errorhandling.v1.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers