[io] Encoding bug in XmlStreamReader in Commons IO 2.14.0?

2023-10-03 Thread Laurence Gonsalves
Hello, It looks like XmlStreamReader is not correctly handling several encodings in Commons IO 2.14.0 that previously worked in version 2.13.0. Here's a self-contained snippet (Kotlin) that demonstrates the problem: val xml = "Ç" val stream = xml.byteInputStream(Charset.forName("437"))

Re: [io] Encoding bug in XmlStreamReader in Commons IO 2.14.0?

2023-10-03 Thread Laurence Gonsalves
On Tue, Oct 3, 2023 at 1:39 AM sebb wrote: > > The byte input stream does not carry any encoding information, so the > XmlStreamReader has to guess what encoding was used. Determining what encoding to use when reading XML from a byte stream is the purpose of XmlStreamReader. From its documentatio

Re: [io] Encoding bug in XmlStreamReader in Commons IO 2.14.0?

2023-10-03 Thread Laurence Gonsalves
:'[A-Za-z]([A-Za-z0-9._]|-)*'))", > > This does not allow for an encoding that starts with a digit; i.e. it > won't match encoding='437' > > AFAICT, no supported encodings start with a digit. > > The '437' encoding is actually kn

Re: [io] Encoding bug in XmlStreamReader in Commons IO 2.14.0?

2023-10-03 Thread Laurence Gonsalves
On Tue, Oct 3, 2023 at 1:50 PM sebb wrote: > > Given this inconsistency, and the fact that there are XML documents "in the > > wild" that use these encoding names, would it be reasonable to relax the > > regex > > just enough so that it'll work with these other names and aliases? > > I would say