I'm not sure if this helps with your need to vary the Sink's schema at
runtime, but FWIW you can get the 'schema' of the input datastream via
DataStream.getType
<https://javadoc.io/static/org.apache.flink/flink-streaming-java_2.11/1.1.0/org/apache/flink/streaming/api/datastream/DataStream.html#getType-->.
This returns a TypeInformation, which I think in your case you can cast(?)
to a RowTypeInfo
<https://nightlies.apache.org/flink/flink-docs-release-1.16/api/java/index.html?org/apache/flink/api/java/typeutils/RowTypeInfo.html>
and examine the fields and types.



On Tue, Aug 27, 2024 at 1:01 AM Péter Váry <peter.vary.apa...@gmail.com>
wrote:

> Hi Jose,
> I have facing a similar issue when working on schema evolution in the
> Iceberg connector. The RowData is optimized in a way, that it is expected
> to have the same schema for the lifetime of the deployment. This avoids any
> extra serialization for every record.
>
> To work around this I see 2 options:
> - Send the whole schema every time along the record (as you mentioned in
> your message). This is what I have seen in the Flink CDC implementation. It
> is easy to implement, no external dependencies but depending on your use
> case might add serious overhead
> - Send only the schemaId along your records, and depend on an external
> schema store to resolve the schema. Needs an external depenency, but there
> is only a minimal serialization overhead
>
> That is the 2 options that I see, and I would love to hear more.
>
> Thanks, Peter
> On Tue, Aug 27, 2024, 01:48 iñigo san jose <inhig...@gmail.com> wrote:
>
>> Hi,
>>
>> I want to build a custom Sink that receives a Row (or GenericRowData or
>> RowData, depending on your reply) and needs to do some processing before
>> sending it to the external sink.
>>
>> So it should be something like this:
>>
>> Input -> ROW<Field1 Type1, Field2 Type2, ..., FieldN, TypeN>
>>
>> Then I need to process that element, depending what the Type is, I need
>> to process it in a different way, so I need to fetch the Data Type of each
>> field at runtime. The types can change from run to run, so the sink won't
>> know them.
>>
>> Is there a way to get the types from the Row itself? I am OK using other
>> Data types if needed.
>>
>> The only solution I found to this is passing an Schema and when iterating
>> through the row and fetching the data type from the schema.
>>
>> Thanks!
>>
>

Reply via email to