Re: [QUESTION] Support for different String encoding in RowCoder

2024-12-19 Thread Robert Bradshaw via dev
+1 to RowCoder being an implementation detail (and it doesn't make much sense to parameterize its string encodings, logically it just has string fields). It does make sense, however, to augment CsvIO to be able to name an encoding that is used to decode the bytes of the file (producing "standard"

Re: [QUESTION] Support for different String encoding in RowCoder

2024-11-27 Thread Reuven Lax via dev
The RowCoder encoding is not really intended to be an external encoding - i.e. it's not intended to be a stable encoding for writing into files. While it's fine to take in PCollection in your write operation, I would not recommend just using RowCoder in order to generate the bytes written to the fi

Re: [QUESTION] Support for different String encoding in RowCoder

2024-11-27 Thread Damon Douglas
Hello Facu, Thank you for bringing this up. We've had this GitHub issue: https://github.com/apache/beam/issues/32485 that tracks this very issue. I wrote a candidate solution to what might work to refactor CsvIOParse to achieve this feature. Feel free to continue the communication on that issu

[QUESTION] Support for different String encoding in RowCoder

2024-11-27 Thread Facundo Tomatis
Hello everyone! I've been developing a csv connector that wraps CsvIO, the read operation outputs PCollection and the write operation takes PCollection. I am having issues setting the encoding of the resulting file and the input file, for example I would like to write a CSV with ISO-8859-1 en