Splitting in Stream Formats for File Source

2023-08-16 Thread Chirag Dewan via user
Hi,I am trying to collect files from HDFS in my DataStream job. I need to collect two types of files - CSV and Parquet.  I understand that Flink supports both formats, but in Streaming mode, Flink doesnt support splitting these formats. Splitting is only supported in Table API. I wanted to under

RE: Flink AVRO to Parquet writer - Row group size/Page size

2023-08-16 Thread Kamal Mittal via user
Hello Community, Please share views for below. Rgds, Kamal From: Kamal Mittal via user Sent: 16 August 2023 04:35 PM To: user@flink.apache.org Subject: Flink AVRO to Parquet writer - Row group size/Page size Hello, For Parquet, default row group size is 128 MB and Page size is 1MB but Flink

Re: Flink k8s operator - managde from java microservice

2023-08-16 Thread Yaroslav Tkachenko
Hi Krzysztof, You may want to check flink-kubernetes-operator-api ( https://mvnrepository.com/artifact/org.apache.flink/flink-kubernetes-operator-api), here's an example for reading FlinkDeployments https://github.com/sap1ens/heimdall/blob/main/src/main/java/com/sap1ens/heimdall/kubernetes/FlinkDe

Flink k8s operator - managde from java microservice

2023-08-16 Thread Krzysztof Chmielewski
Hi, I have a use case where I would like to run Flink jobs using Apache Flink k8s operator. Where actions like job submission (new and from save point), Job cancel with save point, cluster creations will be triggered from Java based micro service. Is there any recommended/Dedicated Java API for Fl

Flink AVRO to Parquet writer - Row group size/Page size

2023-08-16 Thread Kamal Mittal via user
Hello, For Parquet, default row group size is 128 MB and Page size is 1MB but Flink Bulk writer using file sink create the files based on checkpointing interval only. So is there any significance of configured row group size and page size for Flink parquet bulk writer? How Flink uses these two

Re: [Question] Good way to monitor data skewness

2023-08-16 Thread Hang Ruan
Hi, Dennis. As Ron said, we could judge this situation by the metrics. We are usually reporting the metrics to the external system like Prometheus by the metric reporter[1]. And these metrics could be shown by some other tools like grafana[2]. Best, Hang [1] https://nightlies.apache.org/flink/fl