Hi Jose,
I have facing a similar issue when working on schema evolution in the
Iceberg connector. The RowData is optimized in a way, that it is expected
to have the same schema for the lifetime of the deployment. This avoids any
extra serialization for every record.

To work around this I see 2 options:
- Send the whole schema every time along the record (as you mentioned in
your message). This is what I have seen in the Flink CDC implementation. It
is easy to implement, no external dependencies but depending on your use
case might add serious overhead
- Send only the schemaId along your records, and depend on an external
schema store to resolve the schema. Needs an external depenency, but there
is only a minimal serialization overhead

That is the 2 options that I see, and I would love to hear more.

Thanks, Peter
On Tue, Aug 27, 2024, 01:48 iñigo san jose <inhig...@gmail.com> wrote:

> Hi,
>
> I want to build a custom Sink that receives a Row (or GenericRowData or
> RowData, depending on your reply) and needs to do some processing before
> sending it to the external sink.
>
> So it should be something like this:
>
> Input -> ROW<Field1 Type1, Field2 Type2, ..., FieldN, TypeN>
>
> Then I need to process that element, depending what the Type is, I need to
> process it in a different way, so I need to fetch the Data Type of each
> field at runtime. The types can change from run to run, so the sink won't
> know them.
>
> Is there a way to get the types from the Row itself? I am OK using other
> Data types if needed.
>
> The only solution I found to this is passing an Schema and when iterating
> through the row and fetching the data type from the schema.
>
> Thanks!
>

Reply via email to