Re: Pushing metrics to Influx from Flink 1.9 on AWS EMR(5.28)

2020-08-12 Thread bat man
Anyone who has made metrics integration to external systems for flink running on AWS EMR, can you share if its a configuration issue or EMR specific issue. Thanks, Hemant On Wed, Aug 12, 2020 at 9:55 PM bat man wrote: > An update in the yarn logs I could see the below - > > Classpath: > *lib/fl

Re: Avro format in pyFlink

2020-08-12 Thread Rodrigo de Souza Oliveira Brochado
Thank you Xingbo. I've managed to get it working adding the Avro jar and the three artifacts from the *com.fasterxml.jackson.core* group [1]. Is it required to also add the jackson-mapper-asl jar? About joda-time, I suppose that it'll not be required, as I won't use date types in my Avro schema.

Re: Change in sub-task id assignment from 1.9 to 1.10?

2020-08-12 Thread Zhu Zhu
Hi Ken, There were no such changes in my mind. And in Flink there was no designed logic to scatter subtasks of the same operator into different taskmanagers. One workaround to solve your problem could be to increase the parallelism of your source vertex to be no smaller than no other operator so

Re: Flink CPU load metrics in K8s

2020-08-12 Thread Bajaj, Abhinav
Thanks Xintong for your input. From the information I could find, I understand the JDK version 1.8.0_212 we use includes the docker/container support. I also had a quick test inside the docker image using the below – Runtime.getRuntime().availableProcessors() It showed the right number of CPU co

Re: Please help, I need to bootstrap keyed state into a stream

2020-08-12 Thread Marco Villalobos
Hi Seth, Thank you for the advice. The solution you mentioned is exactly what I did. I wrote a small tutorial that explains how to repeat that pattern. You can read about my solution at https://github.com/minmay/flink-patterns/tree/master/bootstrap-keyed-state-into-stream

Re: What async database library does the asyncio code example use?

2020-08-12 Thread KristoffSC
Hi, I do believe that example from [1] where you see DatabaseClient is just a hint that whatever library you would use (db or REST based or whatever else) should be asynchronous or should actually not block. It does not have to be non blocking until it runs on its own thread pool that will return a

Re: What async database library does the asyncio code example use?

2020-08-12 Thread Marco Villalobos
So, I searched for an async DatabaseClient class, and I found r2dbc. Is that it? https://docs.spring.io/spring-data/r2dbc/docs/1.1.3.RELEASE/reference/html On Wed, Aug 12, 2020 at 9:31 AM Marco Villalobos wrote: > I would like to enrich my stream with database calls as documented at: > > > htt

Re: Please help, I need to bootstrap keyed state into a stream

2020-08-12 Thread Seth Wiesman
Just to summarize the conversation so far: The state processor api reads data from a 3rd party system - such as JDBC in this example - and generates a savepoint file that is written out to some DFS. This savepoint can then be used to when starting a flink streaming application. It is a two-step p

What async database library does the asyncio code example use?

2020-08-12 Thread Marco Villalobos
I would like to enrich my stream with database calls as documented at: https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/stream/operators/asyncio.html What async database library does

Re: Pushing metrics to Influx from Flink 1.9 on AWS EMR(5.28)

2020-08-12 Thread bat man
An update in the yarn logs I could see the below - Classpath: *lib/flink-metrics-influxdb-1.9.0.jar:lib/flink-shaded-hadoop-2-uber-2.8.5-amzn-5-7.0.jar:lib/flink-table-blink_2.11-1.9.0.jar:lib/flink-table_2.11-1.9.0.jar:lib/log4j-1.2.17.jar:lib/slf4j-log4j12-1.7.15.jar:log4j.properties:plugins/inf

Re: Is there a way to start a timer without ever receiving an event?

2020-08-12 Thread Andrey Zagrebin
I do not think so. Each timer in KeyedProcessFunction is associated with the key. The key is implicitly set into the context from the record which is currently being processed. On Wed, Aug 12, 2020 at 8:00 AM Marco Villalobos wrote: > In the Stream API KeyedProcessFunction,is there a way to star

Re: JM & TM readiness probe

2020-08-12 Thread Andrey Zagrebin
Hi Alexey, As far as I know, TaskManager does not expose the REST API. ResourceManager redirects some REST calls to TaskManager [1]: /taskmanagers/:taskmanagerid/metrics /taskmanagers/:taskmanagerid/thread-dump These calls may be not so lightweight. I do not know others or how you ask e.g. the sta

Pushing metrics to Influx from Flink 1.9 on AWS EMR(5.28)

2020-08-12 Thread bat man
Hello Experts, I am running Flink - 1.9.0 on AWS EMR(emr-5.28.1). I want to push metrics to Influxdb. I followed the documentation[1]. I added the configuration to /usr/lib/flink/conf/flink-conf.yaml and copied the jar to /usr/lib/flink//lib folder on master node. However, I also understand that t

[DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

2020-08-12 Thread Kostas Kloudas
Hi all, As described in FLIP-131 [1], we are aiming at deprecating the DataSet API in favour of the DataStream API and the Table API. After this work is done, the user will be able to write a program using the DataStream API and this will execute efficiently on both bounded and unbounded data. But

Re: Flink CPU load metrics in K8s

2020-08-12 Thread Xintong Song
Hi Abhinav, Do you know how many total cpus does the physical machine have where the kubernetes container is running? I'm asking because I suspect whether JVM is aware that only 1 cpu is configured for the container. It does not work like JVM understands how many cpu are configured and controls i

Re: Using Event Timestamp sink get's back with machine timezone

2020-08-12 Thread Timo Walther
Hi Faye, the problem lies in the wrong design of JDK's java.sql.Timestamp. You can also find a nice summary in the answer here [1]. java.sql.Timestamp is timezone dependent. Internally, we subtract/normalize the timezone and work with the UNIX timestamp. Beginning from Flink 1.9 we are using

Re: how to add a new runtime operator

2020-08-12 Thread Timo Walther
Hi Vincent, we don't have a step by step guide for adding new operators. Most of the important operations are exposed via DataStream API. Esp. ProcessFunction [1] fits for most complex use cases with access to the primitives such as time and state. What kind of operator is missing for your u

Re: Updating kafka connector with state

2020-08-12 Thread Piotr Nowojski
(adding back user mailing list) Yes, that is correct. Flink 1.8.0 is causing the problem here. 1. Upgrade Flink to 1.11.1 without upgrading the connector 2. Take a new savepoint 3. Upgrade connector to the universal one 4. Restore upgraded job from the new savepoint (2) If it doesn't work, pleas

Re: Updating kafka connector with state

2020-08-12 Thread Piotr Nowojski
Hi Nikola, Which Flink version are you using? Can you describe step by step what you are doing? This error that you have should have been fixed in Flink 1.9.0+ [1], so if you are using an older version of Flink, please first upgrade Flink - without upgrading the job, then upgrade the connector.

Hostname for taskmanagers when running in docker

2020-08-12 Thread Nikola Hrusov
Hello, After upgrading the docker image for flink to 1.11.1 from 1.9 the hostname of the taskmanagers reported to our metrics show as IPs (e.g. 10.0.23.101) instead of hostnames. In the docker compose file we specify the hostname as such: *hostname: "taskmanager-{{ '{{' }}.Node.Hostname{{ '}}'