Hi,

This is a known issue [1]. Was going to resolve this by adding some kind 
of hash table. As preparation for that, I recall adding equals() and 
hashCode() methods to several of the value classes, but I don't think we 
covered all of them.

Thanks.

[1] https://issues.apache.org/jira/browse/XERCESJ-1276

Michael Glavassevich
XML Technologies and WAS Development
IBM Toronto Lab
E-mail: mrgla...@ca.ibm.com
E-mail: mrgla...@apache.org

"Seibert, Olaf" <olaf.seib...@mpi.nl> wrote on 04/26/2016 11:23:26 AM:

> Hi,
> 
> I¹m including some screenshots from jvisualvm showing the time spent in
> parsing a rather large (19 MB) xml file in our program.
> (See https://tla.mpi.nl/tools/tla-tools/elan/)
> 
> We use a Xerces SAX parser, and the overwhelming majority of the parsing
> time, around 140 seconds (the total time is somewhere around 145-150
> seconds) is spent in the above mentioned function
> org.apache.xerces.impl.xs.XMLSchemaValidator$ValueStoreBase.contains().
> 
> Suggested in the second screenshot is that the ValueStoreBase uses a
> Vector to check uniqueness of id values. Given that this will cause
> quadratic time behaviour, it is no wonder that so much time is wasted!
> 
> Is there a way to replace this with a more time-efficient 
implementation,
> short of disabling validation completely? I tried doing the latter, and
> then the entire file is parsed and processed in just a few seconds.
> 
> (can you Cc any copies to me please, since I did not subscribe to this
> list; the instructions at http://xerces.apache.org/xerces2-j/jira.html
> don¹t say that this is required)
> 
> Thanks,
> -Olaf.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscr...@xerces.apache.org
> For additional commands, e-mail: j-users-h...@xerces.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscr...@xerces.apache.org
For additional commands, e-mail: j-users-h...@xerces.apache.org

Reply via email to