I'm not sure if this helps with your need to vary the Sink's schema at runtime, but FWIW you can get the 'schema' of the input datastream via DataStream.getType <https://javadoc.io/static/org.apache.flink/flink-streaming-java_2.11/1.1.0/org/apache/flink/streaming/api/datastream/DataStream.html#getType-->. This returns a TypeInformation, which I think in your case you can cast(?) to a RowTypeInfo <https://nightlies.apache.org/flink/flink-docs-release-1.16/api/java/index.html?org/apache/flink/api/java/typeutils/RowTypeInfo.html> and examine the fields and types.
On Tue, Aug 27, 2024 at 1:01 AM Péter Váry <peter.vary.apa...@gmail.com> wrote: > Hi Jose, > I have facing a similar issue when working on schema evolution in the > Iceberg connector. The RowData is optimized in a way, that it is expected > to have the same schema for the lifetime of the deployment. This avoids any > extra serialization for every record. > > To work around this I see 2 options: > - Send the whole schema every time along the record (as you mentioned in > your message). This is what I have seen in the Flink CDC implementation. It > is easy to implement, no external dependencies but depending on your use > case might add serious overhead > - Send only the schemaId along your records, and depend on an external > schema store to resolve the schema. Needs an external depenency, but there > is only a minimal serialization overhead > > That is the 2 options that I see, and I would love to hear more. > > Thanks, Peter > On Tue, Aug 27, 2024, 01:48 iñigo san jose <inhig...@gmail.com> wrote: > >> Hi, >> >> I want to build a custom Sink that receives a Row (or GenericRowData or >> RowData, depending on your reply) and needs to do some processing before >> sending it to the external sink. >> >> So it should be something like this: >> >> Input -> ROW<Field1 Type1, Field2 Type2, ..., FieldN, TypeN> >> >> Then I need to process that element, depending what the Type is, I need >> to process it in a different way, so I need to fetch the Data Type of each >> field at runtime. The types can change from run to run, so the sink won't >> know them. >> >> Is there a way to get the types from the Row itself? I am OK using other >> Data types if needed. >> >> The only solution I found to this is passing an Schema and when iterating >> through the row and fetching the data type from the schema. >> >> Thanks! >> >