Flatmap "blocking" or not

2024-06-07 Thread Alexandre KY
Hi, I am designing a Flink pipeline to process a stream of images (rasters to be more accurate which are quite heavy: up to dozen GB). To distribute the process of one image, we split it into tiles to which we apply the processing that don't require the whole image before reassembling it. Tile

Flink State and Filesystem sink

2024-07-05 Thread Alexandre KY
Hello, I am trying to implement a satellite image processing chain. Satellite images are stored as rasters which are heavy, (several GBs) in a FileSystem (I am currently using HDFS for testing purpose but will move on S3 when I'll deploy it on the cloud). So in order to reduce the processing ti

Flink Serialisation

2024-07-10 Thread Alexandre KY
Hello, I was wondering if Flink has a size limit to serialize data. I have an object that stores a big 2D array and when I try to hand it over the next operator, I have the following error: ``` 2024-07-10 10:14:51,983 ERROR org.apache.flink.runtime.util.ClusterUncaughtExceptionHandler [] - WAR

RE: Flink Serialisation

2024-07-10 Thread Alexandre KY
ss to next operator which will then pull the data. De : Alexandre KY Envoyé : mercredi 10 juillet 2024 12:32:03 À : user Objet : Flink Serialisation Hello, I was wondering if Flink has a size limit to serialize data. I have an object that stores a big 2D array and whe

RE: Taskslots usage

2024-07-15 Thread Alexandre KY
Hello, Thank you for you answers, I now understand Flink's behavior. Thank you and best regards, Ky Alexandre De : Aleksandr Pilipenko Envoyé : vendredi 12 juillet 2024 19:42:06 À : Alexandre KY Cc : user Objet : Re: Taskslots usage Hello Alexandre,

Logs in application mode

2024-08-27 Thread Alexandre KY
Hello, I am trying to run a Flink job in application mode. Since the cluster is removed after the execution of the job, the logs become inaccessible once the job is completed. Is there a way to access them after the execution of the job ? Best regards, Ky Alexandre

PyFlink on EMR on EKS

2024-09-02 Thread Alexandre KY
Hello, I am trying to deploy my application on Amazon EMR on EKS. My application is in Python and from what I have read, I must create my own custom image. I tried to run a simple test with a very simple Flink job. Here is the Flink application: ```python # main.py import logging import sys f