XmlStreamReader encoding regexp does not work anymore without version

2024-01-02 Thread Andreas Hubold
Hi, the regular expression for the encoding was changed in XmlStreamReader between 2.13.0 and 2.15.1. It now requires a version attribute in the XML declaration and does not work anymore with some real world files. For example, the encoding from the following example declaration is respect

Re: XmlStreamReader encoding regexp does not work anymore without version

2024-01-02 Thread Gary Gregory
Hello Andrea, Please try git master or a 2.16.0-SNAPSHOT build (https://repository.apache.org/content/repositories/snapshots/commons-io/commons-io/2.16.0-SNAPSHOT) I fixed this today as reported in https://github.com/apache/commons-io/pull/550 TY! Gary On Tue, Jan 2, 2024 at 9:33 AM Andreas Hubo

Re: XmlStreamReader encoding regexp does not work anymore without version

2024-01-02 Thread Gary Gregory
Ah, you are talking about something different, I am sorry about that. Looking... Gary On Tue, Jan 2, 2024 at 9:35 AM Gary Gregory wrote: > > Hello Andrea, > > Please try git master or a 2.16.0-SNAPSHOT build > (https://repository.apache.org/content/repositories/snapshots/commons-io/commons-io/2.

Re: XmlStreamReader encoding regexp does not work anymore without version

2024-01-02 Thread Gary Gregory
Hi Andreas, In an "xml" PI, the "version" is NOT optional, see https://www.w3.org/TR/REC-xml/#sec-pi If we tried to handle all cases of invalid documents, then there would be no end to it. Gary On Tue, Jan 2, 2024 at 9:36 AM Gary Gregory wrote: > > Ah, you are talking about something different

Re: XmlStreamReader encoding regexp does not work anymore without version

2024-01-02 Thread Andreas Hubold
Hi Gary, right, but it is optional for external entities, see https://www.w3.org/TR/xml/#TextEntities And the examples in https://www.w3.org/TR/xml/#NT-EncodingDecl also don't have version attributes, so this might still be a valid use case? Cheers Andreas Gary Gregory schrieb am 02.

Re: XmlStreamReader encoding regexp does not work anymore without version

2024-01-02 Thread Gary Gregory
Andreas, I just remembered that we have a lenient setting that could be used to access a different regular expression that does not care about correctness. If we do support this, then the regular expression must be lenient enough but not so much that it can be used as an attack vector for resourc

Re: XmlStreamReader encoding regexp does not work anymore without version

2024-01-02 Thread Gary Gregory
Ah, intersection, I'll look into it. Gary On Tue, Jan 2, 2024, 9:50 AM Andreas Hubold wrote: > Hi Gary, > > right, but it is optional for external entities, see > https://www.w3.org/TR/xml/#TextEntities > > And the examples in https://www.w3.org/TR/xml/#NT-EncodingDecl also > don't have versio

Re: XmlStreamReader encoding regexp does not work anymore without version

2024-01-02 Thread Gary Gregory
I fixed this in git master and 2.16.0-SNAPSHOT builds. Please test and report back! 🙂 Gary On Tue, Jan 2, 2024, 11:03 AM Gary Gregory wrote: > Ah, intersection, I'll look into it. > > Gary > > > On Tue, Jan 2, 2024, 9:50 AM Andreas Hubold > wrote: > >> Hi Gary, >> >> right, but it is optiona