Hi Stanimir,
Thank you so much for this very clear and elaborate explanation. It is
much clearer to me know that have in fact been reading the JEP-403
incorrectly, assuming it refers to any use of the JRE bundled Xerces
parser. So, as long as we stick to coding against the JAXP APIs only and
not use any Xerces API directly, we are good.
Thank you also for the hint on entity resolving! That's an interesting
approach which I hadn't considered.
Regards,
Martin
Am 2024-01-16 17:54, schrieb Stanimir Stamenkov:
Tue, 16 Jan 2024, /Martin Wunderlich/:
my use case is this: "A software developer having direct control over
the application implementation"
I might be misunderstanding the recommendation from JEP-403 then. In
which way is "the JDK's internal copy of the Xerces XML processor"
different from "JDK-bundled JAXP implementation"? I was assuming that
this is this the same thing, as the JDK is using Xerces under the hood
as its JAXP implementation?
Yes, with "the JDK-bundled JAXP implementation" I'm referring to the
"JDK's internal copy of the Xerces". Note, that implementation is
different from Apache Xerces. It uses different packages at least -
com.sun.org.apache.xerces.*
Still, JEP-403 doesn't recommend against using it as a runtime JAXP
provider - it recommends against referring the implementation classes
directly in code, f.e.:
com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl
but using the implementation just via the JAXP APIs. It is all because
this implementation detail is intended to be sealed by JPMS, and
wouldn't be directly accessible by user code.
In the code we're importing the following:
import javax.xml.XMLConstants;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
So it seems you're using only JAXP APIs and should be all fine,
regardless of which JAXP provider is plugged into the runtime.
Here's more detailed example of what I've meant developers should
always provide/configure to their parser instances:
SAXParserFactory spf = SAXParserFactory.newInstance();
SAXParser saxParser = spf.newSAXParser();
XMLReader xmlReader = saxParser.getXMLReader();
EntityResolver
noExternalEntitiesResolver = (publicId, systemId) -> {
return new InputSource(new StringReader(""));
};
xmlReader.setEntityResolver(noExternalEntitiesResolver);
...
xmlReader.parse(...);
Using SAXParser directly would require you to implement
`resolveEntity(String publicId, String systemId)` on your
DefaultHandler (ContentHandler), instead.
You may implement more sophisticated entity resolver which behavior may
be configured via application-specific vs. general JAXP configuration,
and could delegate to a CatalogResolver for required entities packaged
as application resources, but note how the above example differs from
"accessExternalDTD":
saxParser.setProperty(XMLConstants.ACCESS_EXTERNAL_DTD, "");
The latter configuration prohibits any external entities and will fail
the parse operation if an external reference is found, for example:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
The former example substitutes all external entities with empty content
which would result in a successful non-validating (XML DTD) parsing for
most XML documents. You could also have you entity resolver throw an
exception, if any external entities are not expected and strictly
prohibited by your application.