Hello Yuzhan, I replied in one of the tickets where has been a discussion going on for a while.
The use of an encoder or decoder should be optional as this would have a possible negative effect on performance. I am not sure we should consider flipping the processing and base it on an input stream instead of a reader to get access to bytes before we would have to convert them to characters, that seems a radical change. Gary On Tue, Oct 29, 2024, 10:13 AM Yuzhan Jiang <yuzha...@umich.edu> wrote: > Dear Apache Commons Development Team, > > I hope this message finds you well. I am reaching out to discuss an > enhancement request for tracking byte information in CSV records parsed > from CSV files. > > Currently, many users face challenges due to misunderstandings between > bytes and characters. This is especially problematic in situations where > multi-byte characters are present, and the appropriate character encoding > has not been applied. These issues often arise when bytes are mistakenly > treated as characters, assuming a one-to-one correspondence. > > If there is agreement that this enhancement would benefit the community, I > suggest we revisit the discussions on this topic, specifically around Matt > Sun’s work on > [CSV-196] Store the information of raw data read by lexer - ASF JIRA < > https://issues.apache.org/jira/browse/CSV-196> and the PR that he had > submitted[CSV-196] Track byte information of the source by mattsunsjf · > Pull Request #22 · apache/commons-cs < > https://github.com/apache/commons-csv/pull/22>any other similar issues > that have been raised > <https://issues.apache.org/jira/browse/CSV-229>[CSV-229] Allow byte > position tracking in CSVParser - ASF JIRA < > https://issues.apache.org/jira/browse/CSV-229>, so that we can just use > the standard commons-csv libraries. > > Thank you for considering this request, and I look forward to any thoughts > or feedback from the community. > > Best regards, > Yuzhan