hi,

I have a bunch of CSV data files that i need to store in Parquet format. I did 
look at basic documentation on ParquetIO. and ParquetIO.sink() can be used to 
achive the same.

However there is a dependency on the Avro Schema.

how do i infer/generate Avro schema from CSV document data ?

Does beam have any API for the same.

I tried using Kite SDK API CSVUtil / JsonUtil but had no luck generating avro 
schema

my CSV data files have headers in them and quite a few of the header fields are 
hyphenated which are not liked by Kite 's CSVUtil


I think it will be a redundant effort to convert CSV documents to json 
documents .

Any suggestions on how to infer avro schema from CSV data or a JSON schema will 
be helpful


thanks

Sri

Reply via email to