Dear Apache Commons Development Team, I hope this message finds you well. I am reaching out to discuss an enhancement request for tracking byte information in CSV records parsed from CSV files.
Currently, many users face challenges due to misunderstandings between bytes and characters. This is especially problematic in situations where multi-byte characters are present, and the appropriate character encoding has not been applied. These issues often arise when bytes are mistakenly treated as characters, assuming a one-to-one correspondence. If there is agreement that this enhancement would benefit the community, I suggest we revisit the discussions on this topic, specifically around Matt Sun’s work on [CSV-196] Store the information of raw data read by lexer - ASF JIRA <https://issues.apache.org/jira/browse/CSV-196> and the PR that he had submitted[CSV-196] Track byte information of the source by mattsunsjf · Pull Request #22 · apache/commons-cs <https://github.com/apache/commons-csv/pull/22>any other similar issues that have been raised <https://issues.apache.org/jira/browse/CSV-229>[CSV-229] Allow byte position tracking in CSVParser - ASF JIRA <https://issues.apache.org/jira/browse/CSV-229>, so that we can just use the standard commons-csv libraries. Thank you for considering this request, and I look forward to any thoughts or feedback from the community. Best regards, Yuzhan