Request for Tracking Byte Information in CSV Records

Yuzhan Jiang Tue, 29 Oct 2024 07:13:36 -0700

Dear Apache Commons Development Team,

I hope this message finds you well. I am reaching out to discuss an enhancement 
request for tracking byte information in CSV records parsed from CSV files.


Currently, many users face challenges due to misunderstandings between bytes 
and characters. This is especially problematic in situations where multi-byte 
characters are present, and the appropriate character encoding has not been 
applied. These issues often arise when bytes are mistakenly treated as 
characters, assuming a one-to-one correspondence.

If there is agreement that this enhancement would benefit the community, I 
suggest we revisit the discussions on this topic, specifically around Matt 
Sun’s work on
[CSV-196] Store the information of raw data read by lexer - ASF JIRA 
<https://issues.apache.org/jira/browse/CSV-196>  and the PR that he had 
submitted[CSV-196] Track byte information of the source by mattsunsjf · Pull 
Request #22 · apache/commons-cs 
<https://github.com/apache/commons-csv/pull/22>any other similar issues that 
have been raised
 <https://issues.apache.org/jira/browse/CSV-229>[CSV-229] Allow byte position 
tracking in CSVParser - ASF JIRA 
<https://issues.apache.org/jira/browse/CSV-229>, so that we can just use the 
standard commons-csv libraries.

Thank you for considering this request, and I look forward to any thoughts or 
feedback from the community.

Best regards,
Yuzhan

Request for Tracking Byte Information in CSV Records

Reply via email to