Hi,I am trying to collect files from HDFS in my DataStream job. I need to
collect two types of files - CSV and Parquet.
I understand that Flink supports both formats, but in Streaming mode, Flink
doesnt support splitting these formats. Splitting is only supported in Table
API.
I wanted to under
Hello Community,
Please share views for below.
Rgds,
Kamal
From: Kamal Mittal via user
Sent: 16 August 2023 04:35 PM
To: user@flink.apache.org
Subject: Flink AVRO to Parquet writer - Row group size/Page size
Hello,
For Parquet, default row group size is 128 MB and Page size is 1MB but Flink
Hi Krzysztof,
You may want to check flink-kubernetes-operator-api (
https://mvnrepository.com/artifact/org.apache.flink/flink-kubernetes-operator-api),
here's an example for reading FlinkDeployments
https://github.com/sap1ens/heimdall/blob/main/src/main/java/com/sap1ens/heimdall/kubernetes/FlinkDe
Hi,
I have a use case where I would like to run Flink jobs using Apache Flink
k8s operator.
Where actions like job submission (new and from save point), Job cancel
with save point, cluster creations will be triggered from Java based micro
service.
Is there any recommended/Dedicated Java API for Fl
Hello,
For Parquet, default row group size is 128 MB and Page size is 1MB but Flink
Bulk writer using file sink create the files based on checkpointing interval
only.
So is there any significance of configured row group size and page size for
Flink parquet bulk writer? How Flink uses these two
Hi, Dennis.
As Ron said, we could judge this situation by the metrics.
We are usually reporting the metrics to the external system like Prometheus
by the metric reporter[1]. And these metrics could be shown by some other
tools like grafana[2].
Best,
Hang
[1]
https://nightlies.apache.org/flink/fl