Hi francois, > Is there any plan to give avro schemas a better role in Flink in further versions? Haven't heard about avro for csv. You can open a jira for it. Maybe also contribute to flink :-)
On Tue, Jul 10, 2018 at 11:32 PM, françois lacombe < francois.laco...@dcbrain.com> wrote: > Hi Hequn, > > 2018-07-10 3:47 GMT+02:00 Hequn Cheng <chenghe...@gmail.com>: > >> Maybe I misunderstand you. So you don't want to skip the whole file? >> > Yes I do > By skipping the whole file I mean "throw an Exception to stop the process > and inform user that file is invalid for a given reason" and not "the > process goes fully right and import 0 rows" > > >> If does, then "extending CsvTableSource and provide the avro schema to >> the constructor without creating a custom AvroInputFormat" is ok. >> > > Then we agree on this > Is there any plan to give avro schemas a better role in Flink in further > versions? > Avro schemas are perfect to build CSVTableSource with code like > > for (Schema field_nfo : sch.getTypes()){ > // Test if csv file header actually contains a field corresponding to > schema > if (!csv_headers.contains(field_nfo.getName())) { > throw new NoSuchFieldException(field_nfo.getName()); > } > > // Declare the field in the source Builder > src_builder.field(field_nfo.getName(), primitiveTypes.get(field_nfo. > getType())); > } > > All the best > > François > > > >> On Mon, Jul 9, 2018 at 11:03 PM, françois lacombe < >> francois.laco...@dcbrain.com> wrote: >> >>> Hi Hequn, >>> >>> 2018-07-09 15:09 GMT+02:00 Hequn Cheng <chenghe...@gmail.com>: >>> >>>> The first step requires an AvroInputFormat because the source needs >>>> AvroInputFormat to read avro data if data match schema. >>>> >>> >>> I don't want avro data, I just want to check if my csv file have the >>> same fields than defined in a given avro schema. >>> Processing should stop if and only if I find missing columns. >>> >>> A record which not match the schema (types mainly) should be rejected >>> and logged in a dedicated file but the processing can go on. >>> >>> How about extending CsvTableSource and provide the avro schema to the >>> constructor without creating a custom AvroInputFormat? >>> >>> >>> François >>> >> >> >