Github user greghogan commented on the issue: https://github.com/apache/flink/pull/2060 Apologies for the long delay. I'd like to attempt to summarize this ticket and pull request to validate my understanding. Previously StringParser was using the system encoding and `GenericCsvInputFormat` was using UTF-8 for the delimiter and an overloadable UTF-8 for the comment prefix. StringParser's quoteCharacter remains a `byte` with no encoding. Now GenericCsvInputFormat can be configured with a charset which is used for the delimiter, comment prefix, and field parsers (only used in StringParser). Should `setCommentPrefix(String commentPrefix, Charset charset)` and `setCommentPrefix(String commentPrefix, String charsetName)` be removed from `GenericCsvInputFormat`? Would different encodings be used on the same file? Allow the user to set the character encoding in `CsvReader` which would be applied in `CsvReader.configureInputFormat`? Are the new tests checking the encoding? The test strings are using using characters common to UTF-8 and ASCII. We could instead use one of the UTF-16 encodings from https://docs.oracle.com/javase/7/docs/api/java/nio/charset/Charset.html
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---