Re: Pyflink/Flink Java parquet streaming file sink for a dynamic schema stream

Georg Heiler Fri, 03 Dec 2021 00:06:17 -0800

Hi,

the schema of the after part depends on each table i.e. holds different
columns for each table.
So do you receive debezium changelog statements for all/ >1 table? I.e. is
the schema in the after part different?


Best,
Georg

Am Fr., 3. Dez. 2021 um 08:35 Uhr schrieb Kamil ty <kamilt...@gmail.com>:

> Yes the general JSON schema should follow a debezium JSON schema. The
> fields that need to be saved to the parquet file are in the "after" key.
>
> On Fri, 3 Dec 2021, 07:10 Georg Heiler, <georg.kf.hei...@gmail.com> wrote:
>
>> Do the JSONs have the same schema overall? Or is each potentially
>> structured differently?
>>
>> Best,
>> Georg
>>
>> Am Fr., 3. Dez. 2021 um 00:12 Uhr schrieb Kamil ty <kamilt...@gmail.com>:
>>
>>> Hello,
>>>
>>> I'm wondering if there is a possibility to create a parquet streaming
>>> file sink in Pyflink (in Table API) or in Java Flink (in Datastream api).
>>>
>>> To give an example of the expected behaviour. Each element of the stream
>>> is going to contain a json string. I want to save this stream to parquet
>>> files without having to explicitly define the schema/types of the messages
>>> (also using a single sink).
>>>
>>> If this is possible, (might be in Java Flink using a custom
>>> ParquetBulkWriterFactory etc.) any direction for the implementation would
>>> be appreciated.
>>>
>>> Best regards
>>> Kamil
>>>
>>

Re: Pyflink/Flink Java parquet streaming file sink for a dynamic schema stream

Reply via email to