XuQianJin-Stars commented on a change in pull request #6823: [FLINK-10134] UTF-16 support for TextInputFormat bug refixed URL: https://github.com/apache/flink/pull/6823#discussion_r226516362
########## File path: flink-core/src/main/java/org/apache/flink/api/common/io/DelimitedInputFormat.java ########## @@ -472,6 +498,7 @@ public void open(FileInputSplit split) throws IOException { this.offset = splitStart; if (this.splitStart != 0) { + setBomFileCharset(split); Review comment: I have two questions about this commit, as follows: For the first suggestion, I feel that users often cannot know the encoding of the file accurately. For example: file encoding `UTF-16LE`, with bom header, user-specified encoding `UTF-16BE` will report an error. And there is bom UTF with bom encoding I believe will be the majority. So I think it is necessary to do the bom code detection first, which is better for the user experience. For the fourth recommendation, the seek of `GenericCsvInputFormat` cannot be seek to position 0. It calls the `seek` method of `InputStreamFSInputWrapper`. This method cannot currently seek to position 0. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services