[ 
https://issues.apache.org/jira/browse/CAUSEWAY-3835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PJ Fanning updated CAUSEWAY-3835:
---------------------------------
    Description: 
https://github.com/apache/causeway/blob/982de018229db2a097080ade53ccfbb4cceffd12/commons/src/main/java/org/apache/causeway/commons/internal/codec/_DocumentFactories.java

1. In `public Document parseDocument(final @Nullable String xml)`, you can 
avoid the getBytes call that wastes memory and that could be an incorrect 
assumption about the char encoding - not all XML originates as UTF-8 and if you 
already have it in String format, you don't need to convert it back to bytes 
(forcing the XML parser to turn into back into chars).

```
        try(var sw = new StringWriter(xml)) {
            var doc = documentBuilder.parse(new InputSource(sw));
            return doc;
        }
```






  was:
https://github.com/apache/causeway/blob/982de018229db2a097080ade53ccfbb4cceffd12/commons/src/main/java/org/apache/causeway/commons/internal/codec/_DocumentFactories.java

1. In `public Document parseDocument(final @Nullable String xml)`, you can 
avoid the getBytes call that wastes memory and that could be an incorrect 
assumption about the char encoding - not all XML originates as UTF-8 and if you 
already have it in String format, you don't need to convert it back to bytes 
(forcing the XML parser to turn into back into chars).

```
        try(var sw = new StringWriter(xml)) {
            var doc = documentBuilder.parse(new InputSource(sw));
            return doc;
        }
```

2. TransformerFactory is susceptible to XML attacks

https://cheatsheetseries.owasp.org/cheatsheets/XML_External_Entity_Prevention_Cheat_Sheet.html

That page suggests setting:
```
TransformerFactory tf = TransformerFactory.newInstance();
tf.setAttribute(XMLConstants.ACCESS_EXTERNAL_DTD, "");
tf.setAttribute(XMLConstants.ACCESS_EXTERNAL_STYLESHEET, "");
```






> suggested improvments to _DocumentFactories.java
> ------------------------------------------------
>
>                 Key: CAUSEWAY-3835
>                 URL: https://issues.apache.org/jira/browse/CAUSEWAY-3835
>             Project: Causeway
>          Issue Type: Task
>          Components: Tooling
>            Reporter: PJ Fanning
>            Assignee: Andi Huber
>            Priority: Major
>
> https://github.com/apache/causeway/blob/982de018229db2a097080ade53ccfbb4cceffd12/commons/src/main/java/org/apache/causeway/commons/internal/codec/_DocumentFactories.java
> 1. In `public Document parseDocument(final @Nullable String xml)`, you can 
> avoid the getBytes call that wastes memory and that could be an incorrect 
> assumption about the char encoding - not all XML originates as UTF-8 and if 
> you already have it in String format, you don't need to convert it back to 
> bytes (forcing the XML parser to turn into back into chars).
> ```
>         try(var sw = new StringWriter(xml)) {
>             var doc = documentBuilder.parse(new InputSource(sw));
>             return doc;
>         }
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to