Hi Jose, I have facing a similar issue when working on schema evolution in the Iceberg connector. The RowData is optimized in a way, that it is expected to have the same schema for the lifetime of the deployment. This avoids any extra serialization for every record.
To work around this I see 2 options: - Send the whole schema every time along the record (as you mentioned in your message). This is what I have seen in the Flink CDC implementation. It is easy to implement, no external dependencies but depending on your use case might add serious overhead - Send only the schemaId along your records, and depend on an external schema store to resolve the schema. Needs an external depenency, but there is only a minimal serialization overhead That is the 2 options that I see, and I would love to hear more. Thanks, Peter On Tue, Aug 27, 2024, 01:48 iñigo san jose <inhig...@gmail.com> wrote: > Hi, > > I want to build a custom Sink that receives a Row (or GenericRowData or > RowData, depending on your reply) and needs to do some processing before > sending it to the external sink. > > So it should be something like this: > > Input -> ROW<Field1 Type1, Field2 Type2, ..., FieldN, TypeN> > > Then I need to process that element, depending what the Type is, I need to > process it in a different way, so I need to fetch the Data Type of each > field at runtime. The types can change from run to run, so the sink won't > know them. > > Is there a way to get the types from the Row itself? I am OK using other > Data types if needed. > > The only solution I found to this is passing an Schema and when iterating > through the row and fetching the data type from the schema. > > Thanks! >