Re: Using managed keyed state with AsynIo

2020-08-13 Thread Arvid Heise
Hi KristoffSC, you are right that state is not shared across operators - I forgot about that. So the approach would only be valid as is if the state can be properly separated into two independent subtasks. For example, you need the state to find the database key and you store the full entry in Fli

Re: TaskManagers are still up even after job execution completed in PerJob deployment mode

2020-08-13 Thread narasimha
Thanks, Till. Currently, the instance is getting timeout error and terminating the TaskManager. Sure, will try native K8s. On Thu, Aug 13, 2020 at 3:12 PM Till Rohrmann wrote: > Hi Narasimha, > > if you are deploying the Flink cluster manually on K8s then there is > no automatic way of stoppin

Re: Tools for Flink Job performance testing

2020-08-13 Thread narasimha
Thanks, Arvid. The guide was helpful in how to start working with Flink. I'm currently exploring SQL/Table API. Will surely come back for queries on it. On Thu, Aug 13, 2020 at 1:25 PM Arvid Heise wrote: > Hi, > > performance testing is quite vague. Usually you start by writing a small > first

Re: [Flink-KAFKA-KEYTAB] Kafkaconsumer error Kerberos

2020-08-13 Thread Vijayendra Yadav
Hi Yangze, I tried the following: maybe I am missing something. https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/cli.html -yt,--yarnship Run: /usr/lib/flink/bin/flink run -m yarn-cluster -yt ${app_install_path}/conf my KRB5.conf is in ${app_install_path}/conf n master node (loc

Re: [Flink-KAFKA-KEYTAB] Kafkaconsumer error Kerberos

2020-08-13 Thread Yangze Guo
Hi, When deploying Flink on Yarn, you could ship krb5.conf by "--ship" command. Notice that this command only supports to ship folders now. Best, Yangze Guo On Fri, Aug 14, 2020 at 11:22 AM Vijayendra Yadav wrote: > > Any inputs ? > > On Tue, Aug 11, 2020 at 10:34 AM Vijayendra Yadav > wrote:

Re: Avro format in pyFlink

2020-08-13 Thread Xingbo Huang
Hi Rodrigo, For the connectors, Pyflink just wraps the java implementation. And I am not an expert on Avro and corresponding connectors, but as far as I know, DataTypes really cannot declare the type of union you mentioned. Regarding the bytes encoding you mentioned, I actually have no good suggest

Re: [Flink-KAFKA-KEYTAB] Kafkaconsumer error Kerberos

2020-08-13 Thread Vijayendra Yadav
Any inputs ? On Tue, Aug 11, 2020 at 10:34 AM Vijayendra Yadav wrote: > Dawid, I was able to resolve the keytab issue by passing the service name, > but now I am facing the KRB5 issue. > > Caused by: org.apache.kafka.common.errors.SaslAuthenticationException: > Failed to create SaslClient with m

Re: k8s job cluster using StatefulSet

2020-08-13 Thread Yang Wang
Hi Alexey, Actually, StatefulSets could also be used to start the JobManager and TaskManager. So why do we suggest to use Deployment in the Flink documentation? * StatefulSets requires the user to have persistent volume in the K8s cluster. However, it is not always true, especially for the unma

Re: [EXTERNAL] Re: Native K8S Jobmanager restarts and job never recovers

2020-08-13 Thread Yang Wang
Hi kevin, Thanks for sharing more information. You are right. Actually, "too old resource version" is caused by a bug of fabric8 kubernetes-client[1]. It has been fix in v4.6.1. And we have bumped the kubernetes-client version to v4.9.2 in Flink release-1.11. Also it has been backported to release

Re: Flink 1.11 SQL error: No operators defined in streaming topology. Cannot execute.

2020-08-13 Thread Leonard Xu
Hi, Weizheng > 在 2020年8月13日,19:44,Danny Chan 写道: > > tEnv.executeSql would execute the SQL asynchronously, e.g. submitting a job > to the backend cluster with a builtin job name `tEnv.executeSql` is an asynchronous method which will submit the job immediately. If you’re test in your IDE, yo

Re: Hostname for taskmanagers when running in docker

2020-08-13 Thread Xintong Song
Hi Nikola, I'm not entirely sure about how this happened. Would need some more information to investigate, such as the complete configurations for taskmanagers in your docker compose file, and the taskmanager logs. One quick thing you may try is to explicitly set the configuration option `taskman

Re: Avro format in pyFlink

2020-08-13 Thread rodrigobrochado
The upload of the schema through Avro(avro_schema) worked, but I had to select one type from the union type to put in Schema.field(field_type) inside t_env.connect(). If my dict has long and double values, and I declare Schema.field(DataTypes.Double()), all the int values are cast to double. My m

Re: [EXTERNAL] Re: Native K8S Jobmanager restarts and job never recovers

2020-08-13 Thread Bohinski, Kevin
Might be useful https://stackoverflow.com/a/61437982 Best, kevin From: "Bohinski, Kevin" Date: Thursday, August 13, 2020 at 6:13 PM To: Yang Wang Cc: "user@flink.apache.org" Subject: Re: [EXTERNAL] Re: Native K8S Jobmanager restarts and job never recovers Hi Got the logs on crash, hopefull

Re: [EXTERNAL] Re: Native K8S Jobmanager restarts and job never recovers

2020-08-13 Thread Bohinski, Kevin
Hi Got the logs on crash, hopefully they help. 2020-08-13 22:00:40,336 ERROR org.apache.flink.kubernetes.KubernetesResourceManager[] - Fatal error occurred in ResourceManager. io.fabric8.kubernetes.client.KubernetesClientException: too old resource version: 8617182 (8633230)

Performance Flink streaming kafka consumer sink to s3

2020-08-13 Thread Vijayendra Yadav
Hi Team, I am trying to increase throughput of my flink stream job streaming from kafka source and sink to s3. Currently it is running fine for small events records. But records with large payloads are running extremely slow like at rate 2 TPS. Could you provide some best practices to tune? Also,

Re: What async database library does the asyncio code example use?

2020-08-13 Thread Marco Villalobos
Thank you! This was very helpful. Sincerely, Marco A. Villalobos > On Aug 13, 2020, at 1:24 PM, Arvid Heise wrote: > > Hi Marco, > > you don't need to use an async library; you could simply write your code in > async fashion. > > I'm trying to sketch the basic idea using any JDBC driver i

Re: What async database library does the asyncio code example use?

2020-08-13 Thread Arvid Heise
Hi Marco, you don't need to use an async library; you could simply write your code in async fashion. I'm trying to sketch the basic idea using any JDBC driver in the following (it's been a while since I used JDBC, so don't take it too literally). private static class SampleAsyncFunction extends

Re: Flink Parquet Streaming FileSink with scala case class with optional fields error

2020-08-13 Thread Arvid Heise
Hi Vikash, The error is coming from Parquet itself in conjunction with Avro (which is used to infer the schema of your scala class). The inferred schema is { "fields": [ { "name": "level", "type": "string" }, { "name": "time_stamp",

[Announce] Flink Forward Global Program is now Live

2020-08-13 Thread Seth Wiesman
Hi Everyone *The Flink Forward Global 2020 program is now online* and with 2 full days of exciting Apache Flink content, curated by our program committee[1]! Join us on October 21-22 to learn more about the newest technology updates, and hear use cases from Intel, Razorpay, Workday, Microsoft, a

Re: Status of a job when a kafka source dies

2020-08-13 Thread Nick Bendtner
Hi Piotr, Sorry for the late reply. So the poll does not throw an exception when a broker goes down. In spring they solve it by generating an event [1] whenever this happens and you can intercept this event, consumer.timeout.ms helps to some extent does help but if the source topic does not receive

Re: Client's documentation for deploy and run remotely.

2020-08-13 Thread Jacek Grzebyta
It seems the documentation might be outdated. Probably I found what I wanted in different request: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Submit-Flink-1-11-job-from-java-td37245.html Cheers, Jacek On Thu, 13 Aug 2020 at 14:23, Jacek Grzebyta wrote: > Hi, > > I have

Re: Flink cluster deployment strategy

2020-08-13 Thread sidhant gupta
Thanks, I will check it out. On Thu, 13 Aug, 2020, 7:55 PM Arvid Heise, wrote: > Hi Sidhant, > > If you are starting fresh with Flink, I strongly recommend to skip ECS and > EMR and directly go to a kubernetes-based solution. Scaling is much easier > on K8s, there will be some kind of autoscalin

Re: Question about ParameterTool

2020-08-13 Thread Arvid Heise
Since Picocli does not have any dependencies on its own, it's safe to use. It's a bit quirky to use with Scala, but it's imho the best CLI library for java. The only downside as Chesnay mentioned is the increased jar size. Also note that Flink is not graal-ready. Best, Arvid On Wed, Aug 12, 20

Re: Flink cluster deployment strategy

2020-08-13 Thread Arvid Heise
Hi Sidhant, If you are starting fresh with Flink, I strongly recommend to skip ECS and EMR and directly go to a kubernetes-based solution. Scaling is much easier on K8s, there will be some kind of autoscaling coming in the next release, and the best of it all: you even have the option to go to a d

Re: Flink job percentage

2020-08-13 Thread Arvid Heise
Hi Flavio, This is a daunting task to implement properly. There is an easy fix in related workflow systems though. Assuming that it's a rerunning task, then you simply store the run times of the last run, use some kind of low-pass filter (=decaying average) and compare the current runtime with the

Client's documentation for deploy and run remotely.

2020-08-13 Thread Jacek Grzebyta
Hi, I have a problem with some examples in the documentation. Particularly I meant about that paragraph: https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/parallel.html#client-level In the code there are used classes such as: Client and RemoteExecutor. I found those classes in the

Re: Using managed keyed state with AsynIo

2020-08-13 Thread KristoffSC
Hi Arvid, thank you for the respond. Yeah I tried to run my job shortly after posting my message and I got "State is not supported in rich async function" ;) I came up with a solution that would solve my initial problem - concurrent/Async problem of processing messages with the same key but unfor

Re: Flink 1.11 SQL error: No operators defined in streaming topology. Cannot execute.

2020-08-13 Thread Danny Chan
Weighing ~ tEnv.executeSql would execute the SQL asynchronously, e.g. submitting a job to the backend cluster with a builtin job name, the tEnv.executeSql itself did return a JobResult immediately with a constant affected rows count -1. Best, Danny Chan 在 2020年8月13日 +0800 PM3:46,Lu Weizheng ,写道

Re: TaskManagers are still up even after job execution completed in PerJob deployment mode

2020-08-13 Thread Till Rohrmann
Hi Narasimha, if you are deploying the Flink cluster manually on K8s then there is no automatic way of stopping the TaskExecutor/TaskManager pods. This is something you have to do manually (similar to a standalone deployment). The only clean up mechanism is the automatic termination of the TaskMan

Re: getting error after upgrade Flink 1.11.1

2020-08-13 Thread Kostas Kloudas
Hi Dasraj, Yes, I would recommend to use Public and, if necessary, PublicEvolving APIs as they provide better guarantees for future maintenance. Unfortunately there are no Docs about which APIs are public or publiceEvolving but you can see the annotations of the classes in the source code. I guess

Re: TaskManagers are still up even after job execution completed in PerJob deployment mode

2020-08-13 Thread Kostas Kloudas
Hi Narasimha, I am not sure why the TMs are not shutting down, as Yun said, so I am cc'ing Till here as he may be able to shed some light. For the application mode, the page in the documentation that you pointed is the recommended way to deploy an application in application mode. Cheers, Kostas

Re: k8s job cluster using StatefulSet

2020-08-13 Thread Arvid Heise
Hi Alexey, I don't see any issue in using stateful sets immediately. I'd recommend using one of the K8s operators or Ververica's community edition [1] though if you start with a new setup as they may solve even more issues that you might experience in the future. [1] https://www.ververica.com/ge

Re: Using managed keyed state with AsynIo

2020-08-13 Thread Arvid Heise
Hi KristoffSC, Afaik asyncIO does not support state operations at all because of your mentioned issues (RichAsyncFunction fails if you access state). I'd probably solve it by having a map or process function before and after the asyncIO for the state operations. If you enable object reuse, perfor

Re: Tools for Flink Job performance testing

2020-08-13 Thread Arvid Heise
Hi, performance testing is quite vague. Usually you start by writing a small first version of your pipeline and check how the well computation scales on your data. Flink's web UI [1] already helps quite well for the first time. Usually you'd also add some metric system and look for advanced metric

Re: Flink 1.11 SQL error: No operators defined in streaming topology. Cannot execute.

2020-08-13 Thread Lu Weizheng
Thanks Timo, So no need to use execute() method in Flink SQL If I do all the thins from source to sink in SQL. Best Regards, Lu > 2020年8月13日 下午3:41,Timo Walther 写道: > > Hi Lu, > > `env.execute("table api");` is not necessary after FLIP-84 [1]. Every method > that has `execute` in its name w

Re: Flink 1.11 SQL error: No operators defined in streaming topology. Cannot execute.

2020-08-13 Thread Timo Walther
Hi Lu, `env.execute("table api");` is not necessary after FLIP-84 [1]. Every method that has `execute` in its name will immediately execute a job. Therefore your `env.execute` has an empty pipeline. Regards, Timo [1] https://wiki.apache.org/confluence/pages/viewpage.action?pageId=134745878

Re: State Processor API to boot strap keyed state for Stream Application.

2020-08-13 Thread Arvid Heise
For future readers: this thread has been resolved in "Please help, I need to bootstrap keyed state into a stream" on the user mailing list asked by Marco. On Fri, Aug 7, 2020 at 11:52 PM Marco Villalobos wrote: > I have read the documentation and various blogs that state that it is > possible to

Re: Is there a way to start a timer without ever receiving an event?

2020-08-13 Thread Timo Walther
What you can do is creating an initial control stream e.g. using `StreamExecutionEnivronment.fromElements()` and either use `union(controlStream, actualStream)` or use `actualStream.connect(controlStream)`. Regards, Timo On 12.08.20 18:15, Andrey Zagrebin wrote: I do not think so. Each timer

Flink 1.11 SQL error: No operators defined in streaming topology. Cannot execute.

2020-08-13 Thread Lu Weizheng
Hi, I am using Flink 1.11 SQL using java. All my operations are in SQL. I create source tables and insert result into sink tables. No other Java operators. I execute it in Intellij. I can get the final result in the sink tables. However I get the following error. I am not sure it is a bug or th

Re: Flink CPU load metrics in K8s

2020-08-13 Thread Arvid Heise
Hi Abhinav, according to [1], you need 8u261 for the OperatingSystemMXBean to work as expected. [1] https://bugs.openjdk.java.net/browse/JDK-8242287 On Thu, Aug 13, 2020 at 1:10 AM Bajaj, Abhinav wrote: > Thanks Xintong for your input. > > > > From the information I could find, I understand th