This looks really useful, thanks for sharing this.

Is there any place on the xerces site that explains the difference between the "Xerces2 Java 2.12.1" and "Xerces2 Java 2.12.1 (XML Schema 1.1)" downloads? It seems like various places on the site it just say it support Schema 1.1. Is that only if you use the "Xerces2 Java 2.12.1 (XML Schema 1.1)" version of the xerces impl jar?

I'd be inclined to assume so, but one of the projects I'm involved with has xsd 1.1 schemas and I'm pretty sure they are using the "regular" download (checking the comment in the manifest inside the jar seems to confirm this). And they they are performing validations with these schemas and not seeing any errors regarding the schema 1.1 declarations.

I assume there is some statement on the Xerces site detailing the difference between the two flavors of download and I'm just somehow missing it. If someone could point me to it, I'd be much obliged.

Regards,

Eric



On 4/25/21 12:21 AM, Mukul Gandhi wrote:
Hi all,
    The below mentioned findings were made, using Xerces-J 2.12.1 XML Schema 1.1 dist"ribution (available at http://xerces.apache.org/mirrors.cgi). I've used JRE 1.8 to run my XML Schema validations.

During the past, many XML Schema 1.1 users on Xerces-J forums have expressed concerns that, xs:assert requires quite a lot of memory and run time during XML Schema validations when used with Xerces-J. I thought that, I should analyze this aspect a little bit deeply and share my findings with list members here. For this, I ran various kinds of XML Schema 1.1 validations involving xs:assert (and some without xs:assert).

If you're interested in this topic, I would request you to please download the XML and XSD document samples I've uploaded at https://drive.google.com/drive/folders/13lYOY-ECK8_AxbBLq9EcN56dK63Y-KCN?usp=sharing [1] (the downloadable zip archive is about 3.9 MB, when you'll do 'download all').

The XML documents that I've posted, have data with the following pattern,

<?xml version="1.0" encoding="UTF-8"?>
<result>
   <AnalyticsArrangementKey id="5">8833857916</AnalyticsArrangementKey>
   <AnalyticsArrangementKey id="5">8833857923</AnalyticsArrangementKey>
   <AnalyticsArrangementKey id="5">8833857947</AnalyticsArrangementKey>
   <AnalyticsArrangementKey id="5">8833857949</AnalyticsArrangementKey>
   <AnalyticsArrangementKey id="5">8833858104</AnalyticsArrangementKey>
   ... more sibling 'AnalyticsArrangementKey' elements
</result>

The file input_large.xml has size of about 65 MB and has 979224 sibling 'AnalyticsArrangementKey' elements (all these elements are very shallow). The file input_small.xml obeys the same schemas, but is very small (it has 10 'AnalyticsArrangementKey' sibling elements, all of them being very shallow).

To start with, I'll mentioned that, the file input_small.xml validates very quickly with the XSD documents that I've posted, for all the scenarios that I've analyzed. Therefore, there are no problems to worry about for this case.

I do XSD validations in following two ways,

1) Using the Xerces-J jaxp.SourceValidator sample.

2) Using the java file XS11Validator.java, that I've provided on the link [1] mentioned above.

I find using XS11Validator.java, to be better performant than the sample jaxp.SourceValidator, and I'll share few run time details about these, below.

Below are my findings, when using an XML input document input_large.xml for XML Schema validations,

1) Using test_1.xsd (this is XSD 1.0 kind of schema). Not using JVM options -Xms & -Xmx. In this case, default value for -Xmx would be used (which I think is 256 MB). With jaxp.SourceValidator, the time taken to complete validation is 24 minutes.

2) Using assert_1.xsd. Not using JVM options -Xms & -Xmx. With jaxp.SourceValidator, the time taken to complete validation is 23 minutes.

3) Using test_1.xsd. Not using JVM options -Xms & -Xmx. With XS11Validator.java, the time taken to complete validation is 10 minutes.

4) Using assert_1.xsd. Not using JVM options -Xms & -Xmx. With XS11Validator.java, the time taken to complete validation is 17 minutes.

5) Using test_1.xsd. Using JVM options -Xms1024m and -Xmx4096m (that I can comfortably provide on my workstation). With jaxp.SourceValidator, the time taken to complete validation is 21 minutes.

6) Using assert_1.xsd. Using JVM options -Xms1024m and -Xmx4096m. With XS11Validator.java, the time taken to complete validation is 12 minutes.

7) Using assert_2.xsd. Using JVM options -Xms1024m and -Xmx4096m. With XS11Validator.java, the time taken to complete validation is 4 minutes.

8) Using assert_3.xsd. Not using JVM options -Xms & -Xmx. With XS11Validator.java, the time taken to complete validation is 6 minutes.

9) Using assert_3.xsd. Using JVM options -Xms1024m and -Xmx4096m. With XS11Validator.java, the time taken to complete validation is 6 minutes.

Following are my significant observations, from the above mentioned (9) tests,

a) Using an XML Schema validator like XS11Validator.java instead of the sample jaxp.SourceValidator, should be a preferred approach for production like deployments.

b) Use JVM options -Xms & -Xmx whenever possible for XML Schema validation, when validating large XML documents.

c) Compare the results (1) and (2) above. xs:assert doesn't take more time as compared to corresponding XSD 1.0 schema.

d) Compare the results (1) and (3), and (2) and (4) above. Using XS11Validator.java improves run-time as compared to jaxp.SourceValidator.

e) Think about result (7). The validation outcome is valid in this case. Valid outcomes take less time to complete, as compared to invalid outcomes. I think, there's is overhead of printing large number of results to the console.

f) Think about results (8) and (9). In this case, the xs:assert works on a large XDM tree, as compared to other xs:assert cases as mentioned in this mail.

Finally, I think that, all xs:assert cases except assert_3.xsd (in context of this mail), will terminate to completion no matter how large number of iterations it shall be, and the memory requirements do not grow after a certain maximum.

I hope that, this mail has been useful.



--
Regards,
Mukul Gandhi

Reply via email to