Hi Arvid, I certainly appreciate the points you make regarding schema evolution. Actually, I did end up writing an avro2sql script to autogen the DDL in the end.
Thanks, Sumeet On Fri, Apr 9, 2021 at 12:13 PM Arvid Heise <ar...@apache.org> wrote: > Hi Sumeet, > > The beauty of Avro lies in having reader and writer schema and schema > compatibility, such that if your schema evolves over time (which will > happen in streaming naturally but is also very common in batch), you can > still use your application as is without modification. For streaming, this > methodology also implies that you can process elements with different > schema versions in the same run, which is mandatory for any non-toy example. > > If you read into this topic, you will realize that it doesn't make sense > to read from Avro without specifying your reader schema (except for some > generic applications, but they should be written in DataStream). If you > keep in mind that your same dataset could have different schemas, you will > notice that your ideas quickly reach some limitations (which schema to > take?). What you could do, is to write a small script to generate the > schema DDL from your current schema in your actual data if you have very > many columns and datasets. It certainly would also be an interesting idea > to pass a static Avro/Json schema to the DDL. > > On Fri, Apr 2, 2021 at 10:57 AM Paul Lam <paullin3...@gmail.com> wrote: > >> Hi Sumeet, >> >> I’m not a Table/SQL API expert, but from my knowledge, it’s not viable to >> derived SQL table schemas from Avro schemas, because table schemas would be >> the ground truth by design. >> Moreover, one Avro type can be mapped to multiple Flink types, so in >> practice maybe it’s also not viable. >> >> Best, >> Paul Lam >> >> 2021年4月2日 11:34,Sumeet Malhotra <sumeet.malho...@gmail.com> 写道: >> >> Just realized, my question was probably not clear enough. :-) >> >> I understand that the Avro (or JSON for that matter) format can be >> ingested as described here: >> https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/connect.html#apache-avro-format, >> but this still requires the entire table specification to be written in the >> "CREATE TABLE" section. Is it possible to just specify the Avro schema and >> let Flink map it to an SQL table? >> >> BTW, the above link is titled "Table API Legacy Connectors", so is this >> still supported? Same question for YAML specification. >> >> Thanks, >> Sumeet >> >> On Fri, Apr 2, 2021 at 8:26 AM Sumeet Malhotra <sumeet.malho...@gmail.com> >> wrote: >> >>> Hi, >>> >>> Is it possible to directly import Avro schema while ingesting data into >>> Flink? Or do we always have to specify the entire schema in either SQL DDL >>> for Table API or using DataStream data types? From a code maintenance >>> standpoint, it would be really helpful to keep one source of truth for the >>> schema somewhere. >>> >>> Thanks, >>> Sumeet >>> >> >>