About Memory Spilling to Disk in Flink

2021-03-23 Thread Roc Marshal
Hi, Can someone tell me where flink uses memory spilling to write to disk? Thank you. Best, Roc.

Re: Fault Tolerance with RocksDBStateBackend

2021-03-23 Thread Guowei Ma
Hi, You need some persistent storages(like hdfs) for the checkpoint. It is Flink's fault tolerance prerequisites.[1] [1] https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/stream/state/checkpointing.html#prerequisites Best, Guowei On Wed, Mar 24, 2021 at 1:21 PM Maminspapin wrote:

Re: Kubernetes HA - attempting to restore from wrong (non-existing) savepoint

2021-03-23 Thread Yang Wang
Hi Alexey, >From your attached logs, I do not think the new start JobManager will recover from the wrong savepoint. Because you could find the following logs to indicate that the HA related ConfigMaps have been cleaned up successfully. 1393 {"ts":"2021-03-20T02:02:18.506Z","message":"Finished cle

Re: Kubernetes Application Cluster Not Working

2021-03-23 Thread Yang Wang
Are you sure that the JobManager akka address is binded to "flink-jobmanager"? You could set "jobmanager.rpc.address" to flink-jobmanager in the ConfigMap. Best, Yang Guowei Ma 于2021年3月24日周三 上午10:22写道: > Hi, M > Could you give the full stack? This might not be the root cause. > Best, > Guowei >

Fail to cancel perJob for that deregisterApplication is not called

2021-03-23 Thread 刘建刚
I am using flink 1.10.0. My perJob can not be cancelled. From the log I find that webMonitorEndpoint.closeAsync() is completed but deregisterApplication is not called. The related code is as follows: public CompletableFuture deregisterApplicationAndClose( final ApplicationStatus appli

Re: Failed to unit test PyFlink UDF

2021-03-23 Thread Yik San Chan
Hi Dian, Thanks for your patience on all these asks! Best, Yik San On Wed, Mar 24, 2021 at 10:32 AM Dian Fu wrote: > It’s a good advice. I have created ticket > https://issues.apache.org/jira/browse/FLINK-21938 to track this. > > 2021年3月24日 上午10:24,Yik San Chan 写道: > > Hi Dian, > > As you sai

Re: Failed to unit test PyFlink UDF

2021-03-23 Thread Dian Fu
It’s a good advice. I have created ticket https://issues.apache.org/jira/browse/FLINK-21938 to track this. > 2021年3月24日 上午10:24,Yik San Chan 写道: > > Hi Dian, > > As you said, users can, but I got the impression that using ._func to access >

Re: Pyflink tutorial output

2021-03-23 Thread Dian Fu
How did you check the output when submitting to the kubernetes session cluster? I ask this because the output should be written to the local directory “/tmp/output” on the TaskManagers where the jobs are running on. Regards, Dian > 2021年3月24日 上午2:40,Robert Cullen 写道: > > I’m running this scr

Re: Failed to unit test PyFlink UDF

2021-03-23 Thread Yik San Chan
Hi Dian, As you said, users can, but I got the impression that using ._func to access the original Python function is not recommended, therefore not documented. While in Flink, unit testing a Scala/Java UDF is clearly documented and encouraged. Do I misread something? Best, Yik San On Wed, Mar

Re: Pyflink tutorial output

2021-03-23 Thread Shuiqiang Chen
Hi Robert, Have you tried exploring the /tmp/output directory in the task manager pods on you kubernetes cluster? The StreamingFileSink will create the output directory on the host of task manager in which the sink tasks are executed. Best, Shuiqiang Robert Cullen 于2021年3月24日周三 上午2:48写道: > I’m

Re: Kubernetes Application Cluster Not Working

2021-03-23 Thread Guowei Ma
Hi, M Could you give the full stack? This might not be the root cause. Best, Guowei On Wed, Mar 24, 2021 at 2:46 AM Claude M wrote: > Hello, > > I'm trying to setup Flink in Kubernetes using the Application Mode as > described here: > https://ci.apache.org/projects/flink/flink-docs-master/docs/

Re: Failed to unit test PyFlink UDF

2021-03-23 Thread Dian Fu
As I replied in previous email, it doesn’t block users to write tests for PyFlink UDFs. Users could use ._func to access the original Python function if they want. Regards, Dian > 2021年3月23日 下午2:39,Yik San Chan 写道: > > Hi Dian, > > However users do want to unit test their UDFs, as supported

Re: DataDog and Flink

2021-03-23 Thread Vishal Santoshi
That said, is there a way to get a dump of all metrics exposed by TM. I was searching for it and I bet we could get it for ServieMonitor on k8s ( scrape ) but am missing a way to het a TM and dump all metrics that are pushed. Thanks and regards. On Tue, Mar 23, 2021 at 5:56 PM Vishal Santoshi wr

Re: DataDog and Flink

2021-03-23 Thread Vishal Santoshi
I guess there is a bigger issue here. We dropped the property to 500. We also realized that this failure happened on a TM that had one specific job running on it. What was good ( but surprising ) that the exception was the more protocol specific 413 ( as in the chunk is greater then some size limi

Kubernetes Application Cluster Not Working

2021-03-23 Thread Claude M
Hello, I'm trying to setup Flink in Kubernetes using the Application Mode as described here: https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/standalone/kubernetes The doc mentions that there needs to be a aervice exposing the JobManager’s REST and UI port

Pyflink tutorial output

2021-03-23 Thread Robert Cullen
I’m running this script taken from the Flink website: tutorial.py python tutorial.py from pyflink.common.serialization import SimpleStringEncoder from pyflink.common.typeinfo import Types from pyflink.datastream import StreamExecutionEnvironment from pyflink.datastream.connectors import Streaming

Do data streams partitioned by same keyBy but from different operators converge to the same sub-task?

2021-03-23 Thread vishalovercome
Suppose i have a job with 3 operators with the following job graph: O1 => O2 // data stream partitioned by keyBy O1 => O3 // data stream partitioned by keyBy O2 => O3 // data stream partitioned by keyBy If operator O3 receives inputs from two operators and both inputs have the same type and valu

Re: Flink Streaming Counter

2021-03-23 Thread Vijayendra Yadav
Hi Pohl, Thanks for getting back to me so quickly. I am looking for a sample example where I can increment counters on each stage #1 thru #3 for DATASTREAM. Then probably I can print it using slf4j. Thanks, Vijay On Tue, Mar 23, 2021 at 6:35 AM Matthias Pohl wrote: > Hi Vijayendra, > thanks fo

Re: Do data streams partitioned by same keyBy but from different operators converge to the same sub-task?

2021-03-23 Thread Matthias Pohl
Hi Vishal, I'm not 100% sure what you're trying to do. But the partitioning by a key just relies on the key on the used parallelism. So, I guess, what you propose should work. You would have to rely on some join function, though, when merging two input operators into one again. I hope that was hel

Re: How to get operator uid from a sql

2021-03-23 Thread Matthias Pohl
Hi XU Qinghui, sorry for the late reply. Unfortunately, the operator ID does not mean to be accessible for Flink SQL through the API. You might have a chance to extract the Operator ID through the debug logs. StreamGraphHasherV2.generateDeterministicHash logs out the operator ID [1]: "[main] DEBUG

Re: DataDog and Flink

2021-03-23 Thread Vishal Santoshi
If we look at this code , the metrics are divided into chunks up-to a max size. and enqueued

Re: pipeline.auto-watermark-interval vs setAutoWatermarkInterval

2021-03-23 Thread Dawid Wysakowicz
Hey, I would like to double check this with Jark and/or Timo. As far as DataStream is concerned the javadoc is correct. Moreover the pipeline.auto-watermak-interval and setAutoWatermarkInterval are effectively the same setting/option. However I am not sure if Table API interprets it in the same wa

Re: pipeline.auto-watermark-interval vs setAutoWatermarkInterval

2021-03-23 Thread Matthias Pohl
Hi Aeden, sorry for the late reply. I looked through the code and verified that the JavaDoc is correct. Setting pipeline.auto-watermark-interval to 0 will disable the automatic watermark generation. I created FLINK-21931 [1] to cover this. Thanks, Matthias [1] https://issues.apache.org/jira/brows

Re: QueryableStateClient getKVState

2021-03-23 Thread Matthias Pohl
Could you provide the full stacktrace of your error? That might help me to dig into the code. Matthias On Tue, Mar 23, 2021 at 2:33 PM Sandeep khanzode wrote: > Hi Matthias, > > Thanks. But yes, I am comparing map with that.map … the comment is > probably for the previous variable name. > > I c

Re: Flink Streaming Counter

2021-03-23 Thread Matthias Pohl
Hi Vijayendra, thanks for reaching out to the Flink community. What do you mean by displaying it in your local IDE? Would it be ok to log the information out onto stdout? You might want to have a look at the docs about setting up a slf4j metrics report [1] if that's the case. Best, Matthias [1] h

Re: QueryableStateClient getKVState

2021-03-23 Thread Sandeep khanzode
Hi Matthias, Thanks. But yes, I am comparing map with that.map … the comment is probably for the previous variable name. I can use String, Int, Enum, Long type keys in the Key that I send in the Query getKvState … but the moment I introduce a TreeMap, even though it contains a simple one entry

Re: QueryableStateClient getKVState

2021-03-23 Thread Matthias Pohl
Hi Sandeep, the equals method does not compare the this.map with that.map but that.dimensions. ...at least in your commented out code. Might this be the problem? Best, Matthias On Tue, Mar 23, 2021 at 5:28 AM Sandeep khanzode wrote: > Hi, > > I have a stream that exposes the state for Queryable

Re: Evenly distribute task slots across task-manager

2021-03-23 Thread Matthias Pohl
There was a similar discussion recently in this mailing list about distributing the work onto different TaskManagers. One finding Xintong shared there [1] was that the parameter cluster.evenly-spread-out-slots is used to evenly allocate slots among TaskManagers but not how the tasks are actually di

Re: Evenly distribute task slots across task-manager

2021-03-23 Thread Matthias Pohl
Hi Vignesh, are you trying to achieve an even distribution of tasks for this one operator that has the parallelism set to 16? Or do you observe the described behavior also on a job level? I'm adding Chesnay to the thread as he might have more insights on this topic. Best, Matthias On Mon, Mar 22,

Re: Checkpoint fail due to timeout

2021-03-23 Thread Piotr Nowojski
Hi Alexey, You should definitely investigate why the job is stuck. 1. First of all, is it completely stuck, or is something moving? - Use Flink metrics [1] (number bytes/records processed), and go through all of the operators/tasks to check this. 2. The stack traces like the one you quoted: >

[DISCUSS] Feature freeze date for 1.13

2021-03-23 Thread Dawid Wysakowicz
Hi devs, users! 1. *Feature freeze date* We are approaching the end of March which we agreed would be the time for a Feature Freeze. From the knowledge I've gather so far it still seems to be a viable plan. I think it is a good time to agree on a particular date, when it should happen. We suggest

With the checkpoint interval of the same size, the Flink 1.12 version of the job checkpoint time-consuming increase and production failure, the Flink1.9 job is running normally

2021-03-23 Thread Haihang Jing
【Appearance】For jobs with the same configuration (checkpoint interval: 3 minutes, job logic: regular join), flink1.9 runs normally. After flink1.12 runs for a period of time, the checkpoint creation time increases, and finally the checkpoint creation fails. 【Analysis】After learning flink1.10, the

Re: RocksDB StateBuilder unexpected exception

2021-03-23 Thread dhanesh arole
Hi Matthias, Thanks for taking to help us with this. You are right there were lots of task cancellations, as this exception causes the job to get restarted, triggering cancellations. - Dhanesh Arole On Tue, Mar 23, 2021 at 9:27 AM Matthias Pohl wrote: > Hi Danesh, > thanks for reaching out

Re: Flink on Minikube

2021-03-23 Thread Arvid Heise
Hi Sandeep, please have a look at [1], you should add most Flink dependencies as provided - exceptions are connectors (or in general stuff that is not in flink/lib/ or flink/plugins). [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/project-configuration.html#setting-up-a-project-ba

Re: RocksDB StateBuilder unexpected exception

2021-03-23 Thread Matthias Pohl
Hi Danesh, thanks for reaching out to the Flink community. Checking the code, it looks like the OutputStream is added to a CloseableRegistry before writing to it [1]. My suspicion is - based on the exception cause - that the CloseableRegistry got triggered while restoring the state. I tried to tra

Re: Re: Re: [DISCUSSION] Introduce a separated memory pool for the TM merge shuffle

2021-03-23 Thread Guowei Ma
Hi, I discussed with Xingtong and Yingjie offline and we agreed that the name `taskmanager.memory.framework.off-heap.batch-shuffle.size` can better reflect the current memory usage. So we decided to use the name Till suggested. Thank you all for your valuable feedback. Best, Guowei On Mon, Mar 22

Re: OrcTableSource in flink 1.12

2021-03-23 Thread Timo Walther
Hi Nikola, for the ORC source it is fine to use `TableEnvironment#fromTableSource`. It is true that this method is deprecated, but as I said not all connectors have been ported to be supported in the SQL DDL via string properties. Therefore, `TableEnvironment#fromTableSource` is still accessi

Re: Editing job graph at runtime

2021-03-23 Thread Arvid Heise
Hi, Option 2 is to implement your own Source/Sink. Currently, we have the old, discouraged interfaces along with the new interfaces. For source, you want to read [1]. There is a KafkaSource already in 1.12 that we consider Beta, you can replace it with the 1.13 after the 1.13 release (should be c

Re: Checkpoint fail due to timeout

2021-03-23 Thread Roman Khachatryan
Unfortunately, the lock can't be changed as it's part of the public API (though it will be eliminated with the new source API in FLIP-27). Theoretically, the change you've made should improve checkpointing at the cost of throughput. Is it what you see? But the new stack traces seem strange to me

Re: Editing job graph at runtime

2021-03-23 Thread Jessy Ping
Hi Arvid, Thanks for the reply. I am currently exploring the flink features and we have certain use cases where new producers will be added the system dynamically and we don't want to restart the application frequently. It will be helpful if you explain the option 2 in detail ? Thanks & Regards