Hi everyone, in our project we've recently started using a RelaxNG schema to validate our XML documents through the lxml python bindings of libxml2. However sometimes the errors reported for invalid documents are very unhelpful and even we as developers get confused and have to spend a few minutes looking for what's actually wrong. To demonstrate I simplified our schema and an invalid xml document with a simple python script that I've appended to this email. The script is not needed, running xmllint --relaxng schema.rng test.xml will produce the same results.
The error that libxml reports is: test.xml:3:0:ERROR:RELAXNGV:RELAXNG_ERR_EXTRACONTENT: Element interfaces has extra content: eth which is incorrect since the actual error is that the eth element is missing a mandatory attribute. What's also interesting is that if you completely remove the definition and use of the "define" element in the schema (the test.xml doesn't use it so it can stay the same). The error stack changes to: test.xml:3:0:ERROR:RELAXNGV:RELAXNG_ERR_ATTRVALID: Element eth failed to validate attributes test.xml:3:0:ERROR:RELAXNGV:RELAXNG_ERR_EXTRACONTENT: Element interfaces has extra content: eth Which is a reasonable error message, even though it would be a bit more user friendly if there was some kind of information about which attributes failed or are missing, but I can understand that... There are a few more scenarios where similar problems occur, I can describe them if needed, but to keep this email shorter I will ignore them for now. I've also found a few bug reports that describe similar situations, but since they've been last updated several years ago I first wanted to write here before reviving them. So I've done some digging around and figured out that all of these imprecise error reports are related to <interleave> <optional> and <choice> so rules that can easily cause non-determinism. If the non-determinism is handled with some kind of backtracking these kind of problems could arise. The other way is to create a finite automaton that can always be determinized solving this problem. I looked through the libxml sources and found that in fact a finite automaton is created however I didn't find anything related to it's determinization so I'm assuming there isn't anything. I apologize if I've missed something but it's a fairly long source file... I want to ask if this is a bug you would find worth fixing or if the current behaviour is intended (since the bugs in the bug tracker are 5+ years old). If not I might consider fixing this myself but I would like at least some comments about if the implementation of the determinization would be possible to integrate with how the validation is currently handled. Thanks for your reply! Best regards, Ondrej Lichtner -------------------------------------- schema.rng: -------------------------------------- <grammar xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"> <start> <element name="host"> <attribute name="id"/> <interleave> <zeroOrMore> <ref name="params"/> </zeroOrMore> <element name="interfaces"> <zeroOrMore> <ref name="eth"/> </zeroOrMore> </element> </interleave> </element> </start> <define name="define"> <element name="define"> <oneOrMore> <element name="alias"> <attribute name="name"/> <choice> <attribute name="value"/> <text/> </choice> </element> </oneOrMore> </element> </define> <define name="eth"> <element name="eth"> <attribute name="id"/> <attribute name="label"/> <interleave> <optional> <ref name="define"/> </optional> <zeroOrMore> <ref name="params"/> </zeroOrMore> <optional> <ref name="addresses"/> </optional> </interleave> </element> </define> <define name="addresses"> <element name="addresses"> <interleave> <optional> <ref name="define"/> </optional> <zeroOrMore> <element name="address"> <choice> <attribute name="value"/> <text/> </choice> </element> </zeroOrMore> </interleave> </element> </define> <define name="params"> <element name="params"> <interleave> <optional> <ref name="define"/> </optional> <zeroOrMore> <element name="param"> <attribute name="name"/> <choice> <attribute name="value"/> <text/> </choice> </element> </zeroOrMore> </interleave> </element> </define> </grammar> -------------------------------------- test.xml: -------------------------------------- <host id="slave1"> <interfaces> <eth label="A"> <addresses> <address value="192.168.100.1/24"/> </addresses> </eth> </interfaces> </host> -------------------------------------- test.py: -------------------------------------- #!/usr/bin/python from lxml import etree from pprint import pprint relaxng_doc = etree.parse("schema.rng") schema = etree.RelaxNG(relaxng_doc) doc = etree.parse("test.xml") schema.validate(doc) pprint(schema.error_log) _______________________________________________ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org https://mail.gnome.org/mailman/listinfo/xml