This looks really useful, thanks for sharing this.
Is there any place on the xerces site that explains the difference
between the "Xerces2 Java 2.12.1" and "Xerces2 Java 2.12.1 (XML Schema
1.1)" downloads? It seems like various places on the site it just say it
support Schema 1.1. Is that only if you use the "Xerces2 Java 2.12.1
(XML Schema 1.1)" version of the xerces impl jar?
I'd be inclined to assume so, but one of the projects I'm involved with
has xsd 1.1 schemas and I'm pretty sure they are using the "regular"
download (checking the comment in the manifest inside the jar seems to
confirm this). And they they are performing validations with these
schemas and not seeing any errors regarding the schema 1.1 declarations.
I assume there is some statement on the Xerces site detailing the
difference between the two flavors of download and I'm just somehow
missing it. If someone could point me to it, I'd be much obliged.
Regards,
Eric
On 4/25/21 12:21 AM, Mukul Gandhi wrote:
Hi all,
The below mentioned findings were made, using Xerces-J 2.12.1 XML
Schema 1.1 dist"ribution (available at
http://xerces.apache.org/mirrors.cgi). I've used JRE 1.8 to run my XML
Schema validations.
During the past, many XML Schema 1.1 users on Xerces-J forums have
expressed concerns that, xs:assert requires quite a lot of memory and
run time during XML Schema validations when used with Xerces-J. I
thought that, I should analyze this aspect a little bit deeply and
share my findings with list members here. For this, I ran various
kinds of XML Schema 1.1 validations involving xs:assert (and some
without xs:assert).
If you're interested in this topic, I would request you to please
download the XML and XSD document samples I've uploaded at
https://drive.google.com/drive/folders/13lYOY-ECK8_AxbBLq9EcN56dK63Y-KCN?usp=sharing
[1] (the downloadable zip archive is about 3.9 MB, when you'll do
'download all').
The XML documents that I've posted, have data with the following pattern,
<?xml version="1.0" encoding="UTF-8"?>
<result>
<AnalyticsArrangementKey id="5">8833857916</AnalyticsArrangementKey>
<AnalyticsArrangementKey id="5">8833857923</AnalyticsArrangementKey>
<AnalyticsArrangementKey id="5">8833857947</AnalyticsArrangementKey>
<AnalyticsArrangementKey id="5">8833857949</AnalyticsArrangementKey>
<AnalyticsArrangementKey id="5">8833858104</AnalyticsArrangementKey>
... more sibling 'AnalyticsArrangementKey' elements
</result>
The file input_large.xml has size of about 65 MB and has 979224
sibling 'AnalyticsArrangementKey' elements (all these elements are
very shallow). The file input_small.xml obeys the same schemas, but is
very small (it has 10 'AnalyticsArrangementKey' sibling elements, all
of them being very shallow).
To start with, I'll mentioned that, the file input_small.xml validates
very quickly with the XSD documents that I've posted, for all the
scenarios that I've analyzed. Therefore, there are no problems to
worry about for this case.
I do XSD validations in following two ways,
1) Using the Xerces-J jaxp.SourceValidator sample.
2) Using the java file XS11Validator.java, that I've provided on the
link [1] mentioned above.
I find using XS11Validator.java, to be better performant than the
sample jaxp.SourceValidator, and I'll share few run time details about
these, below.
Below are my findings, when using an XML input document
input_large.xml for XML Schema validations,
1) Using test_1.xsd (this is XSD 1.0 kind of schema). Not using JVM
options -Xms & -Xmx. In this case, default value for -Xmx would be
used (which I think is 256 MB). With jaxp.SourceValidator, the time
taken to complete validation is 24 minutes.
2) Using assert_1.xsd. Not using JVM options -Xms & -Xmx. With
jaxp.SourceValidator, the time taken to complete validation is 23 minutes.
3) Using test_1.xsd. Not using JVM options -Xms & -Xmx. With
XS11Validator.java, the time taken to complete validation is 10 minutes.
4) Using assert_1.xsd. Not using JVM options -Xms & -Xmx. With
XS11Validator.java, the time taken to complete validation is 17 minutes.
5) Using test_1.xsd. Using JVM options -Xms1024m and -Xmx4096m (that I
can comfortably provide on my workstation). With jaxp.SourceValidator,
the time taken to complete validation is 21 minutes.
6) Using assert_1.xsd. Using JVM options -Xms1024m and -Xmx4096m. With
XS11Validator.java, the time taken to complete validation is 12 minutes.
7) Using assert_2.xsd. Using JVM options -Xms1024m and -Xmx4096m. With
XS11Validator.java, the time taken to complete validation is 4 minutes.
8) Using assert_3.xsd. Not using JVM options -Xms & -Xmx. With
XS11Validator.java, the time taken to complete validation is 6 minutes.
9) Using assert_3.xsd. Using JVM options -Xms1024m and -Xmx4096m. With
XS11Validator.java, the time taken to complete validation is 6 minutes.
Following are my significant observations, from the above mentioned
(9) tests,
a) Using an XML Schema validator like XS11Validator.java instead of
the sample jaxp.SourceValidator, should be a preferred approach for
production like deployments.
b) Use JVM options -Xms & -Xmx whenever possible for XML Schema
validation, when validating large XML documents.
c) Compare the results (1) and (2) above. xs:assert doesn't take more
time as compared to corresponding XSD 1.0 schema.
d) Compare the results (1) and (3), and (2) and (4) above. Using
XS11Validator.java improves run-time as compared to jaxp.SourceValidator.
e) Think about result (7). The validation outcome is valid in this
case. Valid outcomes take less time to complete, as compared to
invalid outcomes. I think, there's is overhead of printing large
number of results to the console.
f) Think about results (8) and (9). In this case, the xs:assert works
on a large XDM tree, as compared to other xs:assert cases as mentioned
in this mail.
Finally, I think that, all xs:assert cases except assert_3.xsd (in
context of this mail), will terminate to completion no matter how
large number of iterations it shall be, and the memory requirements do
not grow after a certain maximum.
I hope that, this mail has been useful.
--
Regards,
Mukul Gandhi