Le 15/03/2012 00:32, sebb a écrit :
I don't see why, so long as you fetch the values once at the start.
I tried that and the parser was slower. Don't ask me why.
There's a theoretical problem with using a valid char value as a disabled indicator, at least with the parser as it stands. It assumes that the disabled char cannot occur in a file; that is not strictly true, so it could detect an escape where there is none. The example string "pu\\ufffeblic" is parsed as pu<BEL>lic when using CSVFormat.TDF.withUnicodeEscapesInterpreted(true) - i.e. the disabled char is treated as an escape, and \b =<BEL>. This could be avoided by checking isEscaping in the parser; similarly for the other chars that can be disabled.
Good point, I didn't see this one. It can also be avoided by filtering invalid sequences in UnicodeUnescapeReader (or \ufffe specifically). Character.isDefined(char) will tell us if the character exists. U+FFFE doesn't:
http://www.fileformat.info/info/unicode/char/fffe Emmanuel Bourg
smime.p7s
Description: S/MIME Cryptographic Signature