Hello Yuzhan,

I replied in one of the tickets where has been a discussion going on for a
while.

The use of an encoder or decoder should be optional as this would have a
possible negative effect on performance.

I am not sure we should consider flipping the processing and base it on an
input stream instead of a reader to get access to bytes before we would
have to convert them to characters, that seems a radical change.

Gary

On Tue, Oct 29, 2024, 10:13 AM Yuzhan Jiang <yuzha...@umich.edu> wrote:

> Dear Apache Commons Development Team,
>
> I hope this message finds you well. I am reaching out to discuss an
> enhancement request for tracking byte information in CSV records parsed
> from CSV files.
>
> Currently, many users face challenges due to misunderstandings between
> bytes and characters. This is especially problematic in situations where
> multi-byte characters are present, and the appropriate character encoding
> has not been applied. These issues often arise when bytes are mistakenly
> treated as characters, assuming a one-to-one correspondence.
>
> If there is agreement that this enhancement would benefit the community, I
> suggest we revisit the discussions on this topic, specifically around Matt
> Sun’s work on
> [CSV-196] Store the information of raw data read by lexer - ASF JIRA <
> https://issues.apache.org/jira/browse/CSV-196>  and the PR that he had
> submitted[CSV-196] Track byte information of the source by mattsunsjf ·
> Pull Request #22 · apache/commons-cs <
> https://github.com/apache/commons-csv/pull/22>any other similar issues
> that have been raised
>  <https://issues.apache.org/jira/browse/CSV-229>[CSV-229] Allow byte
> position tracking in CSVParser - ASF JIRA <
> https://issues.apache.org/jira/browse/CSV-229>, so that we can just use
> the standard commons-csv libraries.
>
> Thank you for considering this request, and I look forward to any thoughts
> or feedback from the community.
>
> Best regards,
> Yuzhan

Reply via email to