+1 to RowCoder being an implementation detail (and it doesn't make much sense to parameterize its string encodings, logically it just has string fields).
It does make sense, however, to augment CsvIO to be able to name an encoding that is used to decode the bytes of the file (producing "standard" Rows, which would presumably have in memory representations as java Strings and the encoding chosen for them would not be an issue). On Wed, Nov 27, 2024 at 4:56 PM Reuven Lax via dev <dev@beam.apache.org> wrote: > > The RowCoder encoding is not really intended to be an external encoding - > i.e. it's not intended to be a stable encoding for writing into files. While > it's fine to take in PCollection<Row> in your write operation, I would not > recommend just using RowCoder in order to generate the bytes written to the > file. > > On Wed, Nov 27, 2024 at 1:46 PM Facundo Tomatis <facundotoma...@gmail.com> > wrote: >> >> Hello everyone! >> >> I've been developing a csv connector that wraps CsvIO, the read >> operation outputs PCollection<Row> and the write operation takes >> PCollection<Row>. I am having issues setting the encoding of the >> resulting file and the input file, for example I would like to write a >> CSV with ISO-8859-1 encoding or windows-1250 and more, and read from >> those encodings as well. >> >> Reading the source code I found out that Row's String fields (generated >> with RowCoder.of(schema)) have a StringUtf8Encoder associated, is there >> a way to change this encoder to be a custom encoder while maintaining >> PCollection<Row>? >> >> Thanks for your time. >> >> Facu. >>