Re: where does flink store the intermediate results of a join and what is the key?

2020-01-21 Thread kant kodali
Hi Jark, 1) shouldn't it be a col1 to List of row? multiple rows can have the same joining key right? 2) Can I use state processor API from an external application to query the intermediate results in near real-time? I thought

Re: Influxdb reporter not honouring the metrics scope

2020-01-21 Thread Gaurav Singhania
Thanks for the response and the fix. On Wed, 22 Jan 2020 at 01:43, Chesnay Schepler wrote: > The solution for 1.9 and below is to create a customized version of the > influx db reporter which excludes certain tags. > > On 21/01/2020 19:27, Yun Tang wrote: > > Hi, Gaurav > > InfluxDB metrics repo

Re: Custom Metrics outside RichFunctions

2020-01-21 Thread Yun Tang
Hi David FlinkKafkaConsumer in itself is RichParallelSourceFunction, and you could call function below to register your metrics group: getRuntimeContext().getMetricGroup().addGroup("MyMetrics").counter("myCounter") Best Yun Tang From: David Magalhães Sent: Tue

Re: Flink configuration on Docker deployment

2020-01-21 Thread Yang Wang
Hi Soheil, Since you are not using any container orchestration framework(e.g. docker-compose, Kubernetes, mesos), so you need to manually update the flink-conf.yaml in your docker images. Usually, it is located in the path "/opt/flink/conf". Docker volume also could be used to override the flink c

Re: Flink Job Submission Fails even though job is running

2020-01-21 Thread tison
I guess it is a jm internal error which crashes the dispatcher or race condition so that the returning future never completed, possibly related to jdk bug. But again, never have a log in the case I cannot conclude anything. Best, tison. tison 于2020年1月22日周三 上午10:49写道: > It is a known issue repo

Re: Flink Job Submission Fails even though job is running

2020-01-21 Thread tison
It is a known issue reported multiple times that if you are in an early jdk 1.8.x version, upgrade the bugfix version and the issue will vanish. I don't ever have a log on jm side when this issue reported so I'm sorry unable to explain more... Best, tison. Yang Wang 于2020年1月22日周三 上午10:46写道: >

Re: Replacing a server in Zookeeper Quorum

2020-01-21 Thread tison
Good to know :-) Best, tison. Aaron Langford 于2020年1月22日周三 上午10:44写道: > My apologies, I ended up resolving this through experimentation. AWS > replaces master nodes with the same internal DNS names, so configurations > need not be changed. > > Aaron > > > On Tue, Jan 21, 2020, 6:33 PM Yang Wan

Re: Flink Job Submission Fails even though job is running

2020-01-21 Thread Yang Wang
The "web.timeout" will be used for all web monitor asynchronous operations, including the "DispatcherGateway.submitJob" in the "JobSubmitHandler". So when you increase the timeout, does it still could not work? Best, Yang satya brat 于2020年1月21日周二 下午8:57写道: > How does web.timeout help hear?? The

Re: Replacing a server in Zookeeper Quorum

2020-01-21 Thread Aaron Langford
My apologies, I ended up resolving this through experimentation. AWS replaces master nodes with the same internal DNS names, so configurations need not be changed. Aaron On Tue, Jan 21, 2020, 6:33 PM Yang Wang wrote: > Hi Aaron, > > I think it is not the responsibility of Flink. Flink uses zoo

Re: Replacing a server in Zookeeper Quorum

2020-01-21 Thread tison
I second Yang that it would be a workaround that you set a static ip for EMR master node. Even in ZooKeeper world reconfig is a new and immature feature since 3.5.3 while Flink uses ZooKeeper 3.4.x. It would be a breaking change if we "just" upgrade zk version but hopefully the Flink community kee

Re: Replacing a server in Zookeeper Quorum

2020-01-21 Thread Yang Wang
Hi Aaron, I think it is not the responsibility of Flink. Flink uses zookeeper curator to connect the zk server. If multiple zk server are specified, it has an automatic retry mechanism. However, your problem is ip address will change when the EMR instance restarts. Currently, Flink can not support

Re: where does flink store the intermediate results of a join and what is the key?

2020-01-21 Thread Jark Wu
Hi Kant, 1) Yes, it will be stored in rocksdb statebackend. 2) In old planner, the left state is the same with right state which are both `>>`. It is a 2-level map structure, where the `col1` is the join key, it is the first-level key of the state. The key of the MapState is the input row,

Re: Influxdb reporter not honouring the metrics scope

2020-01-21 Thread Chesnay Schepler
The solution for 1.9 and below is to create a customized version of the influx db reporter which excludes certain tags. On 21/01/2020 19:27, Yun Tang wrote: Hi, Gaurav InfluxDB metrics reporter has a fixed format of reporting metrics which cannot be controlled by the scope. If you don't wan

Re: Influxdb reporter not honouring the metrics scope

2020-01-21 Thread Yun Tang
Hi, Gaurav InfluxDB metrics reporter has a fixed format of reporting metrics which cannot be controlled by the scope. If you don't want some tags stored, you can try to use `metrics.reporter..scope.variables.excludes` which introduced in flink-1.10 [1], to exclude specific variables. However,

Re: Location of flink-s3-fs-hadoop plugin (Flink 1.9.0 on EMR)

2020-01-21 Thread Aaron Langford
Senthil, One of the key steps in debugging this for me was enabling debug level logs on my cluster, and then looking at the logs in the resource manager. The failure you are after happens before the exceptions you have reported here. When your Flink application is starting, it will attempt to load

Replacing a server in Zookeeper Quorum

2020-01-21 Thread Aaron Langford
Hello Flink Community, I'm working on a HA setup of Flink 1.8.1 on AWS EMR and have some questions about how Flink interacts with Zookeeper when one of the servers in the quorum specified in flink-conf.yaml goes down and is replaced by a machine with a new IP address. Currently, I configure high-

Re: Location of flink-s3-fs-hadoop plugin (Flink 1.9.0 on EMR)

2020-01-21 Thread Senthil Kumar
Yang, I appreciate your help! Please let me know if I can provide with any other info. I resubmitted my executable jar file as a step to the flink EMR and here’s are all the exceptions. I see two of them. I fished them out of /var/log/Hadoop//syslog 2020-01-21 16:31:37,587 ERROR org.apache.

Re: How to get Task metrics with StatsD metric reporter?

2020-01-21 Thread John Smith
I think I figured it out. I used netcat to debug. I think the Telegraf StatsD server doesn't support spaces in the stats names. On Mon, 20 Jan 2020 at 12:19, John Smith wrote: > Hi, running Flink 1.8 > > I'm declaring my metric as such. > > invalidList = getRuntimeContext() > .getMetricGro

Re: Question regarding checkpoint/savepoint and State Processor API

2020-01-21 Thread Jin Yi
Hi Seth, Thanks for the prompt response! Regarding my second question, once I have converted the existing savepoint to dataset, how can I convert the dataset into BroadcastState? For example, in my BroadcastProcessFunction: @Override public void processBroadcastElement(String key, Context contex

Call for presentations for ApacheCon North America 2020 now open

2020-01-21 Thread Rich Bowen
Dear Apache enthusiast, (You’re receiving this message because you are subscribed to one or more project mailing lists at the Apache Software Foundation.) The call for presentations for ApacheCon North America 2020 is now open at https://apachecon.com/acna2020/cfp ApacheCon will be held at

GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

2020-01-21 Thread Mark Harris
Hi, We're using flink 1.7.2 on an EMR cluster v emr-5.22.0, which runs hadoop v "Amazon 2.8.5". We've recently noticed that some TaskManagers fail (causing all the jobs running on them to fail) with an "java.lang.OutOfMemoryError: GC overhead limit exceeded”. The taskmanager (and jobs that shou

Re: Flink Performance

2020-01-21 Thread Dharani Sudharsan
Thanks David. But I don’t see any solutions provided for the same. On Jan 21, 2020, at 7:13 PM, David Magalhães mailto:speeddra...@gmail.com>> wrote: I've found this ( https://stackoverflow.com/questions/50580756/flink-window-dragged-stream-performance ) post on StackOverflow, where someone c

Re: Flink Performance

2020-01-21 Thread David Magalhães
I've found this ( https://stackoverflow.com/questions/50580756/flink-window-dragged-stream-performance ) post on StackOverflow, where someone complains about performance drop in KeyBy. On Tue, Jan 21, 2020 at 1:24 PM Dharani Sudharsan < dharani.sudhar...@outlook.in> wrote: > Hi All, > > Currently

Flink Performance

2020-01-21 Thread Dharani Sudharsan
Hi All, Currently, I’m running a flink streaming application, the configuration below. Task slots: 45 Task Managers: 3 Job Manager: 1 Cpu : 20 per machine My sample code below: Process Stream: datastream.flatmap().map().process().addsink Data size: 330GB approx. Raw Stream: datastream.ke

Re: Flink Job Submission Fails even though job is running

2020-01-21 Thread satya brat
How does web.timeout help hear?? The issue is with respect to aka dispatched timing out. The job is submitted to the task managers but the response doesn't reach the client. On Tue, Jan 21, 2020 at 12:34 PM Yang Wang wrote: > Hi satya, > > Maybe the job has been submitted to Dispatcher successfu

Flink configuration on Docker deployment

2020-01-21 Thread Soheil Pourbafrani
Hi, I need to set up a Flink cluster using the docker(and not using the docker-compose). I successfully could strat the jobmanager and taskmanager but the problem is I have no idea how to change the default configuration for them. For example in the case of giving 8 slots to the taskmanager or cha

Re: Implementing a tick service

2020-01-21 Thread Dominik Wosiński
Hey, you have access to context in `onTimer` so You can easily reschedule the timer when it is fired. Best, Dom.

Re: Implementing a tick service

2020-01-21 Thread Benoît Paris
Hello all! Please disregard the last message; I used Thread.sleep() and Stateful Source Functions . But just out of curiosity, can processing-time Timers get rescheduled inside the onTim

[no subject]

2020-01-21 Thread Ankush Khanna

where does flink store the intermediate results of a join and what is the key?

2020-01-21 Thread kant kodali
Hi All, If I run a query like this StreamTableEnvironment.sqlQuery("select * from table1 join table2 on table1.col1 = table2.col1") 1) Where will flink store the intermediate result? Imagine flink-conf.yaml says state.backend = 'rocksdb' 2) If the intermediate results are stored in rockdb then

Re: Questions of "State Processing API in Scala"

2020-01-21 Thread Izual
Sry for wrong post. > This can probably be confirmed by looking at the exception stack trace. > Can you post a full copy of that? I missed the history jobs, but I think u r right. When I debug the program to find reason, came into these code snippet. ``` TypeSerializerSchemaCompatibility result

about registering completion function for worker shutdown

2020-01-21 Thread Dominique De Vito
Hi, For a Flink batch job, some value are writing to Kafka through a Producer. I want to register a hook for closing (at the end) the Kafka producer a worker is using hook to be executed, of course, on worker side. Is there a way to do so ? Thanks. Regards, Dominique