[ https://issues.apache.org/jira/browse/CAUSEWAY-3835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
PJ Fanning updated CAUSEWAY-3835: --------------------------------- Description: https://github.com/apache/causeway/blob/982de018229db2a097080ade53ccfbb4cceffd12/commons/src/main/java/org/apache/causeway/commons/internal/codec/_DocumentFactories.java 1. In `public Document parseDocument(final @Nullable String xml)`, you can avoid the getBytes call that wastes memory and that could be an incorrect assumption about the char encoding - not all XML originates as UTF-8 and if you already have it in String format, you don't need to convert it back to bytes (forcing the XML parser to turn into back into chars). ``` try(var sw = new StringWriter(xml)) { var doc = documentBuilder.parse(new InputSource(sw)); return doc; } ``` was: https://github.com/apache/causeway/blob/982de018229db2a097080ade53ccfbb4cceffd12/commons/src/main/java/org/apache/causeway/commons/internal/codec/_DocumentFactories.java 1. In `public Document parseDocument(final @Nullable String xml)`, you can avoid the getBytes call that wastes memory and that could be an incorrect assumption about the char encoding - not all XML originates as UTF-8 and if you already have it in String format, you don't need to convert it back to bytes (forcing the XML parser to turn into back into chars). ``` try(var sw = new StringWriter(xml)) { var doc = documentBuilder.parse(new InputSource(sw)); return doc; } ``` 2. TransformerFactory is susceptible to XML attacks https://cheatsheetseries.owasp.org/cheatsheets/XML_External_Entity_Prevention_Cheat_Sheet.html That page suggests setting: ``` TransformerFactory tf = TransformerFactory.newInstance(); tf.setAttribute(XMLConstants.ACCESS_EXTERNAL_DTD, ""); tf.setAttribute(XMLConstants.ACCESS_EXTERNAL_STYLESHEET, ""); ``` > suggested improvments to _DocumentFactories.java > ------------------------------------------------ > > Key: CAUSEWAY-3835 > URL: https://issues.apache.org/jira/browse/CAUSEWAY-3835 > Project: Causeway > Issue Type: Task > Components: Tooling > Reporter: PJ Fanning > Assignee: Andi Huber > Priority: Major > > https://github.com/apache/causeway/blob/982de018229db2a097080ade53ccfbb4cceffd12/commons/src/main/java/org/apache/causeway/commons/internal/codec/_DocumentFactories.java > 1. In `public Document parseDocument(final @Nullable String xml)`, you can > avoid the getBytes call that wastes memory and that could be an incorrect > assumption about the char encoding - not all XML originates as UTF-8 and if > you already have it in String format, you don't need to convert it back to > bytes (forcing the XML parser to turn into back into chars). > ``` > try(var sw = new StringWriter(xml)) { > var doc = documentBuilder.parse(new InputSource(sw)); > return doc; > } > ``` -- This message was sent by Atlassian Jira (v8.20.10#820010)