Le 15/03/2012 00:32, sebb a écrit :

I don't see why, so long as you fetch the values once at the start.

I tried that and the parser was slower. Don't ask me why.


There's a theoretical problem with using a valid char value as a
disabled indicator, at least with the parser as it stands.
It assumes that the disabled char cannot occur in a file; that is not
strictly true, so it could detect an escape where there is none.

The example string "pu\\ufffeblic" is parsed as pu<BEL>lic when using
CSVFormat.TDF.withUnicodeEscapesInterpreted(true) - i.e. the disabled
char is treated as an escape, and \b =<BEL>.

This could be avoided by checking isEscaping in the parser; similarly
for the other chars that can be disabled.

Good point, I didn't see this one. It can also be avoided by filtering invalid sequences in UnicodeUnescapeReader (or \ufffe specifically). Character.isDefined(char) will tell us if the character exists. U+FFFE doesn't:

http://www.fileformat.info/info/unicode/char/fffe

Emmanuel Bourg

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to