Re: Re: pojo warning when using auto generated protobuf class

2021-04-26 Thread Yun Gao
Hi Prashant, Flink should always give warnings as long as the deduced result is GenericType, no matter it uses the default kryo serializer or the register one, thus if you have registered the type, you may simply ignore the warnings. To make sure it works, you may find the tm that the source ta

Re: Checkpoint error - "The job has failed"

2021-04-26 Thread Yun Tang
Hi Dan, I think you might use older version of Flink and this problem has been resolved by FLINK-16753 [1] after Flink-1.10.3. [1] https://issues.apache.org/jira/browse/FLINK-16753 Best Yun Tang From: Robert Metzger Sent: Monday, April 26, 2021 14:46 To: Dan H

Re: MemoryStateBackend Issue

2021-04-26 Thread Matthias Pohl
I'm not sure what you're trying to achieve. Are you trying to simulate a task failure? Or are you trying to pick up the state from a stopped job? You could achieve the former one by killing the TaskManager instance or by throwing a custom failure as part of your job pipeline. The latter one can be

Deployment/Memory Configuration/Scalability

2021-04-26 Thread Radoslav Smilyanov
Hi all, I am having multiple questions regarding Flink :) Let me give you some background of what I have done so far. *Description* I am using Flink 1.11.2. My job is doing data enrichment. Data is consumed from 6 different kafka topics and it is joined via multiple CoProcessFunctions. On a daily

Re: Flink Event specific window

2021-04-26 Thread Swagat Mishra
Hi Arvid, On 2 - I was referring to stateful functions as an alternative to windows, but in this particular use case, its not fitting in exactly I think, though a solution can be built around it. On the overall approach here what's the right way to use Flink SQL: Every event has the transaction

Re: when should `FlinkYarnSessionCli` be included for parsing CLI arguments?

2021-04-26 Thread Till Rohrmann
Hi Tony, I think you are right that Flink's cli does not behave super consistent at the moment. Case 2. should definitely work because `-t yarn-application` should overwrite what is defined in the Flink configuration. The problem seems to be that we don't resolve the configuration wrt the specifie

Re: when should `FlinkYarnSessionCli` be included for parsing CLI arguments?

2021-04-26 Thread Yangze Guo
Hi, Till, I agree that we need to resolve the issue by overriding the configuration before selecting the CustomCommandLines. However, IIUC, after FLINK-15852 the GenericCLI should always be the first choice. Could you help me to understand why the FlinkYarnSessionCli can be activated? Best, Yang

Re: when should `FlinkYarnSessionCli` be included for parsing CLI arguments?

2021-04-26 Thread Till Rohrmann
I think you are right that the `GenericCLI` should be the first choice. >From the top of my head I do not remember why FlinkYarnSessionCli is still used. Maybe it is in order to support some Yarn specific cli option parsing. I assume it is either an oversight or some parsing has not been completely

Re: when should `FlinkYarnSessionCli` be included for parsing CLI arguments?

2021-04-26 Thread Yangze Guo
If the GenericCLI is selected, then the execution.target should have been overwritten to "yarn-application" in GenericCLI#toConfiguration. It is odd that why the GenericCLI#isActive return false as the execution.target is defined in both flink-conf and command line. Best, Yangze Guo On Mon, Apr 2

Re: when should `FlinkYarnSessionCli` be included for parsing CLI arguments?

2021-04-26 Thread Tony Wei
Hi Till, Yangze, I think FLINK-15852 should solve my problem. It is my fault that my flink version is not 100% consistent with the community version, and FLINK-15852 is the one I missed. Thanks for your information. best regards, Till Rohrmann 於 2021年4月26日 週一 下午5:14寫道: > I think you are right

Flink Hive connector: hive-conf-dir supports hdfs URI, while hadoop-conf-dir supports local path only?

2021-04-26 Thread Yik San Chan
Hi community, This question is cross-posted on Stack Overflow https://stackoverflow.com/questions/67264156/flink-hive-connector-hive-conf-dir-supports-hdfs-uri-while-hadoop-conf-dir-sup In my current setup, local dev env can access testing env. I would like to run Flink job on local dev env, whil

Re: when should `FlinkYarnSessionCli` be included for parsing CLI arguments?

2021-04-26 Thread Tony Wei
Hi Till, I have created the ticket to extend the description of `execution.targe`. https://issues.apache.org/jira/browse/FLINK-22476 best regards, Tony Wei 於 2021年4月26日 週一 下午5:24寫道: > Hi Till, Yangze, > > I think FLINK-15852 should solve my problem. > It is my fault that my flink version is no

PyFlink: Shall we disallow relative URL for filesystem path?

2021-04-26 Thread Yik San Chan
Hi community, When using Filesystem SQL Connector, users need to provide a path. When running a PyFlink job using the mini-cluster mode by simply `python WordCount.py`, the path can be a relative path, such as, `words.txt`. However, trying to submit the job to `flink run` will fail without questio

Contradictory docs: python.files config can include not only python files

2021-04-26 Thread Yik San Chan
Hi community, In https://ci.apache.org/projects/flink/flink-docs-stable/dev/python/python_config.html, regarding python.files: > Attach custom python files for job. This makes readers think only Python files are allowed here. However, in https://ci.apache.org/projects/flink/flink-docs-stable/dep

Confusing docs on python.archives

2021-04-26 Thread Yik San Chan
Hi community, In https://ci.apache.org/projects/flink/flink-docs-stable/dev/python/python_config.html#python-options , > For each archive file, a target directory is specified. If the target directory name is specified, the archive file will be extracted to a name can directory with the specified

Re: Contradictory docs: python.files config can include not only python files

2021-04-26 Thread Dian Fu
Hi Yik San, 1) what `--pyFiles` is used for: All the files specified via `--pyFiles` will be put in the PYTHONPATH of the Python worker during execution and then they will be available for the Python user-defined functions during execution. 2) validate for the files passed to `--pyFiles` Curre

Re: Confusing docs on python.archives

2021-04-26 Thread Dian Fu
Hi Yik San, It should be a typo issue. I guess it should be `If the target directory name is specified, the archive file will be extracted to a directory with the specified name.` Regards, Dian > 2021年4月26日 下午8:57,Yik San Chan 写道: > > Hi community, > > In > https://ci.apache.org/projects/f

Re: Task Local Recovery with mountable disks in the cloud

2021-04-26 Thread Till Rohrmann
Hi Sonam, sorry for the late reply. We were a bit caught in the midst of the feature freeze for the next major Flink release. In general, I think it is a very good idea to disaggregate the local state storage to make it reusable across TaskManager failures. However, it is also not trivial to do.

Re: The wrong Options of Kafka Connector, will make the cluster can not run any job

2021-04-26 Thread Robert Metzger
Thanks a lot for your message. This could be a bug in Flink. It seems that the archival of the execution graph is failing because some classes are unloaded. What I observe from your stack traces is that some classes are loaded from flink-dist_2.11-1.11.2.jar, while other classes are loaded from te

Re: Flink missing Kafka records

2021-04-26 Thread Robert Metzger
Hi Dan, Can you describe under which conditions you are missing records (after a machine failure, after a Kafka failure, after taking and restoring from a savepoint, ...). Are many records missing? Are "the first records" or the "latest records" missing? Any individual records missing, or larger b

Re: Correctly serializing "Number" as state in ProcessFunction

2021-04-26 Thread Robert Metzger
Quick comment on the kryo type registration and the messages you are seeing: The messages are expected: What the message is saying is that we are not serializing the type using Flink's POJO serializer, but we are falling back to Kryo. Since you are registering all the instances of Number that you a

RE: [1.9.2] Flink SSL on YARN - NoSuchFileException

2021-04-26 Thread Hailu, Andreas [Engineering]
Hi Arvid, thanks for the reply. Our stores are world-readable, so I don’t think that it’s an access issue. All of our clients have the stores present through a shared mount as well. I’m able to see the shipped stores in the directory.info output when pulling the YARN logs, and can confirm the a

Re: Flink Metric isBackPressured not available

2021-04-26 Thread David Anderson
The isBackPressured metric is a Boolean -- it reports true or false, rather than 1 or 0. The Flink web UI can not display it (it shows NaN); perhaps the same is true for Datadog. https://issues.apache.org/jira/browse/FLINK-15753 relates to this. Regards, David On Tue, Apr 13, 2021 at 12:13 PM Cl

Re: Writing to Avro from pyflink

2021-04-26 Thread Edward Yang
Hi Dian, Thanks for trying it out, it ruled out a problem with the python code. I double checked the jar path and only included the jar you referenced without any luck. However, I tried creating a python 3.7 (had 3.8) environment for pyflink and the code worked without any errors! On Sun, Apr 25

RE: [1.9.2] Flink SSL on YARN - NoSuchFileException

2021-04-26 Thread Hailu, Andreas [Engineering]
Hey Nico, thanks for your reply. I gave this a try and unfortunately had no luck. // ah -Original Message- From: Nico Kruber Sent: Wednesday, April 21, 2021 1:01 PM To: user@flink.apache.org Subject: Re: [1.9.2] Flink SSL on YARN - NoSuchFileException Hi Andreas, judging from [1], it s

Re: Checkpoint error - "The job has failed"

2021-04-26 Thread Dan Hill
Hey Yun and Robert, I'm using Flink v1.11.1. Robert, I'll send you a separate email with the logs. On Mon, Apr 26, 2021 at 12:46 AM Yun Tang wrote: > Hi Dan, > > I think you might use older version of Flink and this problem has been > resolved by FLINK-16753 [1] after Flink-1.10.3. > > > [1] h

Re: Contradictory docs: python.files config can include not only python files

2021-04-26 Thread Yik San Chan
Hi Dian, It is still not clear to me - does it only allow Python files (.py), or not? Best, Yik San On Mon, Apr 26, 2021 at 9:15 PM Dian Fu wrote: > Hi Yik San, > > 1) what `--pyFiles` is used for: > All the files specified via `--pyFiles` will be put in the PYTHONPATH of > the Python worker d

Re: Confusing docs on python.archives

2021-04-26 Thread Yik San Chan
Hi Dian, I wonder where can we specify the target directory? Best, Yik San On Mon, Apr 26, 2021 at 9:19 PM Dian Fu wrote: > Hi Yik San, > > It should be a typo issue. I guess it should be `If the target directory > name is specified, the archive file will be extracted to a directory with > the

Re: Flink missing Kafka records

2021-04-26 Thread Dan Hill
Hey Robert. Nothing weird. I was trying to find recent records (not the latest). No savepoints (just was running about ~1 day). No checkpoint issues (all successes). I don't know how many are missing. I removed the withIdleness. The other parts are very basic. The text logs look pretty usele

Re: Contradictory docs: python.files config can include not only python files

2021-04-26 Thread Dian Fu
Hi Yik San, All the files which could be put in the PYTHONPATH are allowed here, e.g. .zip, .whl, etc. Regards, Dian > 2021年4月27日 上午8:16,Yik San Chan 写道: > > Hi Dian, > > It is still not clear to me - does it only allow Python files (.py), or not? > > Best, > Yik San > > On Mon, Apr 26, 20

Re: Confusing docs on python.archives

2021-04-26 Thread Dian Fu
There are multiple ways to specify the target directory depending on how to specify the python archives. 1) API: add_python_archive(“file:///path/to/py_env .zip", "myenv"), see [1] for more details, 2) configuration: python.archives, e.g. file:///path/to/py_env.zip#myenv 3) command line argument

Re: Confusing docs on python.archives

2021-04-26 Thread Dian Fu
For the command line arguments, it’s documented in https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/cli.html > 2021年4月27日 上午10:19,Dian Fu 写道: > > There are multiple ways to specify the target directory depending on how to > specify the python archives. > 1) API: add_python_arch

Re: Contradictory docs: python.files config can include not only python files

2021-04-26 Thread Yik San Chan
Hi Dian, If that's the case, shall we reword "Attach custom python files for job." into "attach custom files that could be put in PYTHONPATH, e.g., .zip, .whl, etc." Best, Yik San On Tue, Apr 27, 2021 at 10:08 AM Dian Fu wrote: > Hi Yik San, > > All the files which could be put in the PYTHONPA

Re: Deployment/Memory Configuration/Scalability

2021-04-26 Thread Yangze Guo
Hi, Radoslav, > 1. Is it a good idea to have regular savepoints (say on a daily basis)? > 2. Is it possible to have high availability with Per-Job mode? Or maybe I > should go with session mode and make sure that my flink cluster is running a > single job? Yes, we can achieve HA with per-job mo

Re: Contradictory docs: python.files config can include not only python files

2021-04-26 Thread Dian Fu
Thanks for the suggestion. It makes sense to me~. > 2021年4月27日 上午10:28,Yik San Chan 写道: > > Hi Dian, > > If that's the case, shall we reword "Attach custom python files for job." > into "attach custom files that could be put in PYTHONPATH, e.g., .zip, .whl, > etc." > > Best, > Yik San > >

Re: Deployment/Memory Configuration/Scalability

2021-04-26 Thread Radoslav Smilyanov
Hi Yangze Guo, Thanks for your reply. > 1. Is it a good idea to have regular savepoints (say on a daily basis)? > > 2. Is it possible to have high availability with Per-Job mode? Or maybe > I should go with session mode and make sure that my flink cluster is > running a single job? > Yes, we can

Re: Watermarks in Event Time Temporal Join

2021-04-26 Thread Leonard Xu
Hello, Maciej > I agree the watermark should pass on versioned table side, because > this is the only way to know which version of record should be used. > But if we mimics behaviour of interval join then main stream watermark > could be skipped. IIRC, rowtime interval join requires the watermark