Hi Everyone, Hope you are doing well. We are currently using Flink Table API (Flink Version-1.12.0) to stream data from Kafka and store it in Google Cloud Storage. The file format we are using to store data is Parquet. Initially the Flink job worked perfectly fine and we were able to stream data and store it successfully in Google Cloud Storage. But what we noticed is, once we increase the cardinality of input data and also increase the volume of data to Kafka i.e. stream more events per second to Kafka, we noticed that the Flink Job throws the following errors:
1. GC Overlimit Exceeded 2. Java Heap memory Out of Space- Error. We tried running flink using Kubernetes Cluster and flink on YARN. In both cases, as the volume of data increased, we saw the above errors. We provided 2 task managers 10 gb each and 1 gb for the job manager. The Checkpoint interval we have for our flink job is 3 minutes. I am aware that there has been a bug filed in the Flink- https://issues.apache.org/jira/browse/FLINK-20945. Please let me know, if there is a way to solve this issue and when the JIRA Bug FLINK-20945 can be resolved. We are trying to do test run with some of our customers. It’s a production blocker for us. Regards Aswin From: Sivaraman Venkataraman, Aswin Ram <aswin.ram.sivaraman.venkatara...@sap.com> Date: Monday, February 15, 2021 at 12:15 AM To: dev@flink.apache.org <dev@flink.apache.org> Cc: Sivaraman Venkataraman, Aswin Ram <aswin.ram.sivaraman.venkatara...@sap.com> Subject: Out of Memory Error-Heap when storing parquet files using Flink Table API (Flink version-1.12.0) in Google Cloud Storage Hi Everyone, We are currently using Flink Table API (Flink Version-1.12.0) to stream data from Kafka and store it in Google Cloud Storage. The file format we are using to store data is Parquet. Initially the Flink job worked perfectly fine and we were able to stream data and store it successfully in Google Cloud Storage. But what we noticed is, once we increase the cardinality of input data and also increase the speed of data generated to Kafka i.e. stream more events per second to Kafka, we noticed that the Flink Job throws the following errors: 1. GC Overlimit Exceeded 2. Java Heap memory Out of Space- Error. Initially I provided 4 gb each to Job Manager and Task Manager. I started the flink’s yarn session with the following command: ./bin/yarn-session.sh -jm 4096m -tm 4096m -s 3 One thing we noticed was on increasing the memory provided to Job Manager and Task Manager i.e. I restarted the yarn session for flink with the following parameters: ./bin/yarn-session.sh -jm 10240m -tm 10240m -s 3 I noticed that, the error was no longer being thrown by the Flink job. The question I have is, other than increasing the JVM Heap size, is there any other way in Flink to prevent this JVM Heap Memory out of space error?. I am aware that, flink while writing Parquet files, buffers the data in memory before flushing it to disk. So, when we increased the cardinality of data, it may have led to more data being buffered in memory, thereby causing this error. Additionally, the Checkpoint interval we have for our flink job is 3 minutes. Please let me know, if you need any more information to further understand the problem. Regards Aswin