e them as one big parquet on a daily
basis
the source provided 15 min parquet chunks
Any suggestions here will be helpful
thanks
Sri
From: Alexey Romanenko
Sent: Friday, January 25, 2019 10:31:37 AM
To: user@beam.apache.org
Subject: Re: ParquetIO write
not there yet to solve BEAM jira's , but it will help immensly
> if AVRO scehma inference is avoided
> some thing like python pandas/pyarrow does
>
> thanks for your help
> Sri
>
> From: Sridevi Nookala
> Sent: Wednesday, January 23, 2019 9:41:02 PM
> To: user@beam
a inference is avoided
some thing like python pandas/pyarrow does
thanks for your help
Sri
From: Sridevi Nookala
Sent: Wednesday, January 23, 2019 9:41:02 PM
To: user@beam.apache.org
Subject: Re: ParquetIO write of CSV document data
Hi Alex,
Thanks for the sugg
ay, January 15, 2019 7:02:56 AM
To: user@beam.apache.org
Subject: Re: ParquetIO write of CSV document data
Hi Sri,
it's exactly as Alexey says, although there are plans/ideas to improve
ParquetIO in a way that would not require defining the schema manually.
Some Jiras that might be interest
Hi Sri,
it's exactly as Alexey says, although there are plans/ideas to improve
ParquetIO in a way that would not require defining the schema manually.
Some Jiras that might be interesting in this topic but not yet resolved
(maybe you are willing to contribute?):
https://issues.apache.org/jira/bro
Hi Sri,
Afaik, you have to create “PCollection" of "GenericRecord”s and define your
Avro schema manually to write your data into Parquet files.
In this case, you will need to create a ParDo for this translation. Also, I
expect that your schema is the same for all CSV files.
Basic example of us
hi,
I have a bunch of CSV data files that i need to store in Parquet format. I did
look at basic documentation on ParquetIO. and ParquetIO.sink() can be used to
achive the same.
However there is a dependency on the Avro Schema.
how do i infer/generate Avro schema from CSV document data ?
Doe