Re: Avro format in pyFlink

Xingbo Huang Thu, 13 Aug 2020 21:06:58 -0700

Hi Rodrigo,
For the connectors, Pyflink just wraps the java implementation.
And I am not an expert on Avro and corresponding connectors, but as far as
I know, DataTypes really cannot declare the type of union you mentioned.
Regarding the bytes encoding you mentioned, I actually have no good
suggestions.
I think we need a Avro expert to answer your question.


Best,
Xingbo

rodrigobrochado <rodrigo.broch...@predito.com.br> 于2020年8月14日周五 上午10:07写道：

>
> The upload of the schema through Avro(avro_schema) worked, but I had to
> select one type from the union type to put in Schema.field(field_type)
> inside t_env.connect(). If my dict has long and double values, and I
> declare
> Schema.field(DataTypes.Double()), all the int values are cast to double. My
> maps will also have string values and the job will crash using this
> configuration.
>
> Is there any workaround? If not, I thought of serializing it on the UDTF
> using the python avro lib and sending it as bytes to the sink. The problem
> is that all serialization formats change the original schema: the CSV
> format
> use the base64 encoding for bytes; the JSON format adds a key, to form a
> key/value pair, where the value will the binary; and the Avro format adds 3
> bytes at the beginning of the message.
>
> Thanks,
> Rodrigo
>
>
>
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>

Re: Avro format in pyFlink

Reply via email to