Re: Avro schema

Sumeet Malhotra Tue, 13 Apr 2021 18:04:32 -0700

Hi Arvid,

I certainly appreciate the points you make regarding schema evolution.
Actually, I did end up writing an avro2sql script to autogen the DDL in the
end.


Thanks,
Sumeet

On Fri, Apr 9, 2021 at 12:13 PM Arvid Heise <ar...@apache.org> wrote:

> Hi Sumeet,
>
> The beauty of Avro lies in having reader and writer schema and schema
> compatibility, such that if your schema evolves over time (which will
> happen in streaming naturally but is also very common in batch), you can
> still use your application as is without modification. For streaming, this
> methodology also implies that you can process elements with different
> schema versions in the same run, which is mandatory for any non-toy example.
>
> If you read into this topic, you will realize that it doesn't make sense
> to read from Avro without specifying your reader schema (except for some
> generic applications, but they should be written in DataStream). If you
> keep in mind that your same dataset could have different schemas, you will
> notice that your ideas quickly reach some limitations (which schema to
> take?). What you could do, is to write a small script to generate the
> schema DDL from your current schema in your actual data if you have very
> many columns and datasets. It certainly would also be an interesting idea
> to pass a static Avro/Json schema to the DDL.
>
> On Fri, Apr 2, 2021 at 10:57 AM Paul Lam <paullin3...@gmail.com> wrote:
>
>> Hi Sumeet,
>>
>> I’m not a Table/SQL API expert, but from my knowledge, it’s not viable to
>> derived SQL table schemas from Avro schemas, because table schemas would be
>> the ground truth by design.
>> Moreover, one Avro type can be mapped to multiple Flink types, so in
>> practice maybe it’s also not viable.
>>
>> Best,
>> Paul Lam
>>
>> 2021年4月2日 11:34，Sumeet Malhotra <sumeet.malho...@gmail.com> 写道：
>>
>> Just realized, my question was probably not clear enough. :-)
>>
>> I understand that the Avro (or JSON for that matter) format can be
>> ingested as described here:
>> https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/connect.html#apache-avro-format,
>> but this still requires the entire table specification to be written in the
>> "CREATE TABLE" section. Is it possible to just specify the Avro schema and
>> let Flink map it to an SQL table?
>>
>> BTW, the above link is titled "Table API Legacy Connectors", so is this
>> still supported? Same question for YAML specification.
>>
>> Thanks,
>> Sumeet
>>
>> On Fri, Apr 2, 2021 at 8:26 AM Sumeet Malhotra <sumeet.malho...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> Is it possible to directly import Avro schema while ingesting data into
>>> Flink? Or do we always have to specify the entire schema in either SQL DDL
>>> for Table API or using DataStream data types? From a code maintenance
>>> standpoint, it would be really helpful to keep one source of truth for the
>>> schema somewhere.
>>>
>>> Thanks,
>>> Sumeet
>>>
>>
>>

Re: Avro schema

Reply via email to