Cool, thanks!
On Fri, Mar 12, 2021, 13:15 Arvid Heise wrote:
> Hi Avi,
>
> thanks for clarifying.
>
> It seems like it's not possible to parse Parquet in Flink without knowing
> the schema. What i'd do is to parse the metadata while setting up the job
> and then pass it to the input format:
>
>
Hi Avi,
thanks for clarifying.
It seems like it's not possible to parse Parquet in Flink without knowing
the schema. What i'd do is to parse the metadata while setting up the job
and then pass it to the input format:
ParquetMetadata parquetMetadata =
MetadataReader.readFooter(inputStream, path,
Hi Arvid,
assuming that I have A0,B0,C0 parquet files with different schema and a
common field *ID*, I want to write them to A1,B2,C3 files respectively. My
problem is that in my code I do not want to know the full schema just by
filtering using the ID field and writing the unfiltered lines to the
Hi Avi,
I'm not entirely sure I understand the question. Let's say you have source
A, B, C all with different schema but all have an id. You could use the
ParquetMapInputFormat that provides a map of the records and just use a
map-lookup.
However, I'm not sure how you want to write these records
Hi all,
I am trying to filter lines from parquet files, the problem is that they
have different schemas, however the field that I am using to filter
exists in all schemas.
in spark this is quite straight forward :
*val filtered = rawsDF.filter(col("id") != "123")*
I tried to do it in flink by ext