RE: Helper methods for catching unexpected key changes?

2021-10-07 Thread Schwalbe Matthias
Good morning Dan, Being short of information on how you arranged your job, I can only make general comments: ReinterpretAsKeyedStream only applies to data streams that are in fact partitioned by the same key, i.e. your job would look somewhat like this: DataStreamUtils.reinterpretAsKeyedStream

Helper methods for catching unexpected key changes?

2021-10-07 Thread Dan Hill
Hi. I'm getting the following errors when using reinterpretAsKeyedStream. I don't expect the key to change for rows in reinterpretAsKeyedStream. Are there any utilities that I can use that I can use with reinterpetAsKeyedStream to verify that the key doesn't change? E.g. some wrapper operator?

Re: Kubernetes HA - Reusing storage dir for different clusters

2021-10-07 Thread Yang Wang
When the Flink job reached to global terminal state(FAILED, CANCELED, FINISHED), all the HA related data(including pointers in ConfigMap and concrete data in DFS) will be cleaned up automatically. Best, Yang Alexis Sarda-Espinosa 于2021年10月4日周一 下午3:59写道: > Hello, > > > > If I deploy a Flink-Kube

Re: Can BroadcastProcessFunction invoke both methods concurrently?

2021-10-07 Thread Caizhi Weng
Hi! Just like what you said, they won't be invoked concurrently. Flink is using the actor model in runtime so methods in operators won't be called at the same time. By the caching layer I suppose you would like to store the broadcast messages into the java map for some time and periodically store

Re: jdbc connector configuration

2021-10-07 Thread Caizhi Weng
Hi! These configurations are not required to merely read from a database. They are here to accelerate the reads by allowing sources to read data in parallel. This optimization works by dividing the data into several (scan.partition.num) partitions and each partition will be read by a task slot (n

Re: Exceeded Checkpoint tolerable failure threshold Exception

2021-10-07 Thread Caizhi Weng
Hi! You need to look into the root cause of checkpoint failure. You can see the "Checkpoint" tab to see if checkpointing timeout occurs or see the "Exception" tab for exception messages other than this one. You can also dive into the logs for suspicious information. If checkpoint failures are rar

Re: Pyflik job data stream to table conversion declareManagedMemory exception

2021-10-07 Thread Dian Fu
Hi Kamil, I have checked that this method exists in 1.12.3: https://github.com/apache/flink/blob/release-1.12.3/flink-python/src/main/java/org/apache/flink/python/util/PythonConfigUtil.java#L137 Could you double check whether the Flink version is 1.12.3 (not just the PyFlink version)? Regards, D

Re: How to create backpressure with a Statefun remote function?

2021-10-07 Thread Igal Shilman
Hello Christian, The challenge with generic back pressure and remote functions, is that StateFun doesn't know if it targets a single process or a fleet of processes behind a load balancer and an autoscaler. Triggering back pressure too early might never kick in the autoscaling. Indeed that parame

Exceeded Checkpoint tolerable failure threshold Exception

2021-10-07 Thread Robert Cullen
I have Flink set up with 2 taskmanagers and one jobmanager. I've allocated 25 gb of JVM Heap and 15 gb of Flink managed memory. I have 2 jobs running. After 3 hours this exception was thrown. How can I configure flink to prevent this from happening? 2021-10-07 12:38:50 org.apache.flink.util.Fl

RE: FlinkJobNotFoundException

2021-10-07 Thread Gusick, Doug S
Hi Matthias, I just wanted to follow up here. Were you able to access the jobmanager log? If so, were you able to find anything around the issues we have been facing? Best, Doug From: Hailu, Andreas [Engineering] Sent: Thursday, September 30, 2021 8:56 AM To: Matthias Pohl ; Gusick, Doug S [En

How to create backpressure with a Statefun remote function?

2021-10-07 Thread Christian Krudewig (Corporate Development)
Hello fellow Flink users, How do you create backpressure with Statefun remote functions? I'm using an asynchronous web server for the remote function (Python aiohttp on uvicorn) which accepts more requests than its CPU bound backend can handle. That can make the requests time out and can trigger a

AW: Deploying python statefun program on standalone Flink cluster

2021-10-07 Thread Christian Krudewig (Corporate Development)
Hello Le, The whole charm of statefun from my point of view comes with the remote functions. Especially on kubernetes it gives you the option to scale and deploy the remote function with the core logic independent of the flink worker/manager. Examples are in the playground repository: htt

Re: Deploying python statefun program on standalone Flink cluster

2021-10-07 Thread Igal Shilman
Hello Le, Currently the only way to execute a Python function with StateFun is through a remote function. This means that you need to host the function separately. [1] Good luck! Igal [1] https://nightlies.apache.org/flink/flink-statefun-docs-master/docs/modules/http-endpoint/ On Thu, Oct 7, 20