Hi, This is a known issue [1]. Was going to resolve this by adding some kind of hash table. As preparation for that, I recall adding equals() and hashCode() methods to several of the value classes, but I don't think we covered all of them.
Thanks. [1] https://issues.apache.org/jira/browse/XERCESJ-1276 Michael Glavassevich XML Technologies and WAS Development IBM Toronto Lab E-mail: mrgla...@ca.ibm.com E-mail: mrgla...@apache.org "Seibert, Olaf" <olaf.seib...@mpi.nl> wrote on 04/26/2016 11:23:26 AM: > Hi, > > I¹m including some screenshots from jvisualvm showing the time spent in > parsing a rather large (19 MB) xml file in our program. > (See https://tla.mpi.nl/tools/tla-tools/elan/) > > We use a Xerces SAX parser, and the overwhelming majority of the parsing > time, around 140 seconds (the total time is somewhere around 145-150 > seconds) is spent in the above mentioned function > org.apache.xerces.impl.xs.XMLSchemaValidator$ValueStoreBase.contains(). > > Suggested in the second screenshot is that the ValueStoreBase uses a > Vector to check uniqueness of id values. Given that this will cause > quadratic time behaviour, it is no wonder that so much time is wasted! > > Is there a way to replace this with a more time-efficient implementation, > short of disabling validation completely? I tried doing the latter, and > then the entire file is parsed and processed in just a few seconds. > > (can you Cc any copies to me please, since I did not subscribe to this > list; the instructions at http://xerces.apache.org/xerces2-j/jira.html > don¹t say that this is required) > > Thanks, > -Olaf. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: j-users-unsubscr...@xerces.apache.org > For additional commands, e-mail: j-users-h...@xerces.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: j-users-unsubscr...@xerces.apache.org For additional commands, e-mail: j-users-h...@xerces.apache.org