Thanks Mukul for the implementation insights.
Duplicating the tree does indeed explain a lot – this is a thing that Saxon EE 
is somehow handling differently in its implementation (which is quite fast), 
when I compare it directly.

In general I always feel that XSD 1.1 adoption (and using assertions) is not 
that widespread when I talk to other XML users/devs so I can understand the 
incentive for improving this are quite non-existent.

I will see if I can find a way around this limitation.

Thanks, Daniel


Von: Mukul Gandhi <muk...@apache.org>
Gesendet: Samstag, 27. März 2021 06:32
An: j-users@xerces.apache.org
Betreff: Re: Java Heap Space problems with XSD 1.1 validation, asserts and 
large files

On Thu, Mar 25, 2021 at 5:13 PM Zimmel, Daniel 
<d.zim...@esvmedien.de<mailto:d.zim...@esvmedien.de>> wrote:
My XML file is deeply nested and has 440.000 lines when indented.

I hope, you mean that, your XML file has 440000 lines.

Anyhow, when I change the XSD version to 1.1 and insert a sample assertion 
(xsd:assert test="false()") in the content model for my root element, my CPU 
and memory are filling up quite fast, even giving me a Heap Space Error.

That's an expected behaviour with Xerces. The Xerces XSD 1.1 implementation, 
constructs an XML in-memory DOM/XDM tree for (each) <xsd:assert>, which is 
rooted at an XML instance element that is validated by a xsd:complexType that 
has an <xsd:assert>. This is to say that, <xsd:assert> implementation is memory 
hungry for large XML instance documents that are validated by <xsd:assert> for 
XML elements on/near root of the XML instance tree, and also particularly when 
the <xsd:assert> XML instance tree is deeply nested.

Some of the measures that I could advise, for issues described by you are 
following,
1) If possible, use IDC constraints or CTA, instead of <xsd:assert>. Or, use 
any other non <xsd:assert> XSD constructs for validation.
2) Do part of XML instance validation, within your client code that is invoking 
Xerces XSD 1.1 validation.
3) Try using the JVM options -Xms and -Xmx, to tune the heap memory to best 
extent. If possible (if it's a production and profit making project), use more 
RAM on the workstation where XSD 1.1 validation is taking place.

Should I file a JIRA bug issue?

Its up to you. From my point of view, this issue won't likely result in Xerces 
XSD 1.1 implementation code improvements.



--
Regards,
Mukul Gandhi

Reply via email to