DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=25767>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=25767

Performance & threading issues in JAXPUtils' use of SAXParser & SAXParserFactory

           Summary: Performance & threading issues in JAXPUtils' use of
                    SAXParser & SAXParserFactory
           Product: Ant
           Version: 1.6.0
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: Minor
          Priority: Other
         Component: Core
        AssignedTo: [EMAIL PROTECTED]
        ReportedBy: [EMAIL PROTECTED]


Comments on when JAXPUtils creates new factory / parser / document builder
objects and when it caches them.

1. Correctness:

private static SAXParserFactory parserFactory = null;
public static synchronized SAXParserFactory getParserFactory() {
    if (parserFactory == null) {
        parserFactory = newParserFactory();
    }
    return parserFactory;
}

(and similar methods) is likely not correct. An instance of SAXParserFactory is
not thread-safe and might not be reentrant, according to its Javadoc. If more
than one thread in the JVM is calling this method and using the same parser,
random corruption might result - this might happen to someone using <parallel>,
for example; or someone running several builds at once using an IDE integration.
Or if this method is called in the middle of a parse for some reason - to parse
some referenced file? - there might be corruption; at least SAX1's Parser says
it is not reentrant.

2. Performance: casual examination of the performance of running a tiny build
script w/ several antlib.xml's over and over using a profiler shows some 20-30%
of the CPU time being spent in JAXPUtils.getNamespaceXMLReader(), specifically
calling Xerces' SAXParserImpl.<init>(). Not an especially cheap call since it
sets up a lot of objects of various kinds; you are supposed to cache the result
from call to call if you want good performance and might be parsing a lot of
files. (E.g. a lot of small antlib.xml's, or many little build.xml's with
<subant>, etc.) Note that this time is not spent actually parsing the XML or
even opening the input stream to it - just getting a parser object.

Suggested fix to both problems:

1. Rework the helper methods in JAXPUtils to cache not just the factory objects
but the parsers & document builders themselves. I don't know how expensive
getting the factory is - JAXPUtils already caches them, and my profiling only
took place on a "warmed-up" JVM - but invoking the factories' new*() methods is
too expensive to do every time an XML file is parsed.

2. Leave uncached impls as e.g. SAXParser createSAXParser() and so on - these
would always create new ones. Or for compatibility, keep the current method
names, but deprecate them in favor of properly cached versions (see #3).

3. For cached usage, use e.g. SAXParser obtainSAXParser() and void
releaseSAXParser(SAXParser) methods. These should use java.lang.ThreadLocal:

private static final ThreadLocal/*<SAXParser>*/ saxParser = new ThreadLocal();
public static SAXParser obtainSAXParser() {
    SAXParser p = (SAXParser)saxParser.get();
    if (p != null) {
        saxParser.set(null);
    } else {
        p = createSAXParser();
    }
    return p;
}
public static void releaseSAXParser(SAXParser p) {
    saxParser.set(p);
}

// Usage:
SAXParser p = obtainSAXParser();
try {
   // parse...
} finally {
    releaseSAXParser(p);
}

Forgetting to release a parser is not a big deal, it will just need to be
recreated on the next obtain*() call. Using ThreadLocal ensures that different
threads never share the same object. Setting the ref to null in obtain*()
ensures that reentrant calls (i.e. calling obtain*() while the last-obtained
parser is still in use) will be safe.

One issue is that several utility methods do not return the original cacheable
object, but rather something derived from it, e.g.

return newSAXParser(getNSParserFactory()).getXMLReader();

For this usage you should probably have obtainNamespaceXMLReader +
releaseNamespaceXMLReader, etc. - i.e. keep a separate cache for each kind of
object - since you can't get back to the SAXParser from the XMLReader.

Could try to provide a patch if there is interest - at least modifications to
JAXPUtils, and usage of the new methods in e.g. ProjectHelper2.parse().

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to