RE: Flink Conf "yarn.flink-dist-jar" Question

2020-03-06 Thread Hailu, Andreas
Hi Tison, thanks for the reply. I’ve replied to the ticket. I’ll be watching it as well. // ah From: tison Sent: Friday, March 6, 2020 1:40 PM To: Hailu, Andreas [Engineering] Cc: user@flink.apache.org Subject: Re: Flink Conf "yarn.flink-dist-jar" Question FLINK-13938 seems a bit different th

Understanding n LIST calls as part of checkpointing

2020-03-06 Thread Piyush Narang
Hi folks, I was trying to debug a job which was taking 20-30s to checkpoint data to Azure FS (compared to typically < 5s) and as part of doing so, I noticed something that I was trying to figure out a bit better. Our checkpoint path is as follows: my_user/featureflow/foo-datacenter/cluster_name

Re: Flink Conf "yarn.flink-dist-jar" Question

2020-03-06 Thread tison
FLINK-13938 seems a bit different than your requirement. The one totally matches is FLINK-14964 . I'll appreciate it if you can share you opinion on the JIRA ticket. Best, tison. tison 于2020年3月7日周六 上午2:35写道: > Yes your requirement is exactly t

Re: Flink Conf "yarn.flink-dist-jar" Question

2020-03-06 Thread tison
Yes your requirement is exactly taken into consideration by the community. We currently have an open JIRA ticket for the specific feature[1] and works for loosing the constraint of flink-jar schema to support DFS location should happen. Best, tison. [1] https://issues.apache.org/jira/browse/FLINK

Flink Conf "yarn.flink-dist-jar" Question

2020-03-06 Thread Hailu, Andreas
Hi, We noticed that every time an application runs, it uploads the flink-dist artifact to the /user//.flink HDFS directory. This causes a user disk space quota issue as we submit thousands of apps to our cluster an hour. We had a similar problem with our Spark applications where it uploaded the

Re: History server UI not working

2020-03-06 Thread pwestermann
I am seeing this error in firefox: ERROR TypeError: "this.statusService.configuration.features is undefined" t http://10.25.197.60:8082/main.177039bdbab11da4f8ac.js:1 qr http://10.25.197.60:8082/main.177039bdbab11da4f8ac.js:1 Gr http://10.25.197.60:8082/main.177039bdbab11da4f8ac.js:1

Re: History server UI not working

2020-03-06 Thread Robert Metzger
I'm also suspecting a problem with the UI updated Do the Developer Tools of the browser show any error messages? On Thu, Mar 5, 2020 at 7:00 AM Yang Wang wrote: > If all the rest api could be viewed successfully, then the reason may be > js cache. > You could try to force a refresh(e.g. Cmd+Shf

Re: Setting the operator-id to measure percentile latency over several jobs

2020-03-06 Thread Robert Metzger
@Chesnay Schepler : Does it make sense to file a ticket to add the operator name to the latency metrics as well? On Thu, Mar 5, 2020 at 4:31 PM Felipe Gutierrez < felipe.o.gutier...@gmail.com> wrote: > thanks! I was wondering why the operator name is not implemented for the > latency metrics, bec

Re: EXTERNAL: Re: Should I use a Sink or Connector? Or Both?

2020-03-06 Thread Castro, Fernando C.
Arvid, thank you that was it! After setting these properties to my Elasticsearch connector, I was able to see the records upserting into ES! .bulkFlushMaxActions(2) .bulkFlushInterval(1000L) Thank you, Fernando From: Arvid Heise Date: Thursday, March 5, 2020 at 2:27 AM To: "Castro, Fernando C

Re: Flink Deployment failing with RestClientException

2020-03-06 Thread Robert Metzger
Hey Samir, can you try setting the following configuration parameter (make sure the JobManager log confirms that the changed value is in effect) web.timeout: 30 This might uncover the underlying problem (as we are waiting longer for the underlying issue to timeout). Are you able to upgrade t

Re: How to print the aggregated state everytime it is updated?

2020-03-06 Thread Robert Metzger
Hey, I don't think you need to use a window operator for this use case. A reduce (or fold) operation should be enough: https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/operators/ On Fri, Mar 6, 2020 at 11:50 AM kant kodali wrote: > Hi, > > Thanks for this. so how can I emulate

Re: (DISSCUSS) flink cli need load '--classpath' files

2020-03-06 Thread tison
Hi Jingsong, I think your propose is "--classpath can occur behind the jar file". Generally speaking I agree on that it is a painful required format that users tend to just ignore that order how an option occurs. So it is +1 from my side to loose the constraint. However, for the migration and im

Re: (DISSCUSS) flink cli need load '--classpath' files

2020-03-06 Thread Yang Wang
I think tison's answer is on point. All the Flink cli options should be specified before the user jar. We have a very clear help message. Syntax: run [OPTIONS] Best, Yang Jingsong Li 于2020年3月6日周五 下午10:27写道: > Hi tison and Aljoscha, > > Do you think "--classpath can not be in front of jar fi

Re: (DISSCUSS) flink cli need load '--classpath' files

2020-03-06 Thread Jingsong Li
Hi tison and Aljoscha, Do you think "--classpath can not be in front of jar file" is an improvement? Or need documentation? Because I used to be confused. Best, Jingsong Lee On Fri, Mar 6, 2020 at 10:22 PM tison wrote: > I think the problem is that --classpath should be before the user jar, >

Re: (DISSCUSS) flink cli need load '--classpath' files

2020-03-06 Thread tison
It is because as implementation when we parse command line argument it "stopAtNonOptions" at the arbitrary content user jar. All arguments later will be regarded as args passed to user main. For user serving, when you run `./bin/flink run -h`, it prints Action "run" compiles and runs a program.

Re: (DISSCUSS) flink cli need load '--classpath' files

2020-03-06 Thread tison
I think the problem is that --classpath should be before the user jar, i.e., /opt/flink/job/kafkaDemo19-1.0-SNAPSHOT.jar Best, tison. Aljoscha Krettek 于2020年3月6日周五 下午10:03写道: > Hi, > > first a preliminary question: does the jar file contain > com.alibaba.fastjson.JSON? Could you maybe list the

Re: (DISSCUSS) flink cli need load '--classpath' files

2020-03-06 Thread Jingsong Li
Hi ouywl, As I know, "--classpath" should be in front of jar file, it means: /opt/flink/bin/flink run --jobmanager ip:8081 --class com.netease.java.TopSpeedWindowing --parallelism 1 --detached --classpath file:///opt/flink/job/fastjson-1.2.66.jar /opt/flink/job/kafkaDemo19-1.0-SNAPSHOT.jar You ca

Re: Writing retract streams to Kafka

2020-03-06 Thread Gyula Fóra
Thanks Kurt, I came to the same conclusions after trying what Jark provided. I can get similar behaviour if I reduce the grouping window to 1 sec but still keep the join window large. Gyula On Fri, Mar 6, 2020 at 3:09 PM Kurt Young wrote: > @Gyula Fóra I think your query is right, we should >

Re: Writing retract streams to Kafka

2020-03-06 Thread Kurt Young
@Gyula Fóra I think your query is right, we should produce insert only results if you have event time and watermark defined. I've create https://issues.apache.org/jira/browse/FLINK-16466 to track this issue. Best, Kurt On Fri, Mar 6, 2020 at 12:14 PM Kurt Young wrote: > Actually this use case

Re: (DISSCUSS) flink cli need load '--classpath' files

2020-03-06 Thread Aljoscha Krettek
Hi, first a preliminary question: does the jar file contain com.alibaba.fastjson.JSON? Could you maybe list the contents of the jar here? Best, Aljoscha On 06.03.20 13:25, ouywl wrote: Hi all When I start a flinkcluster in session mode, It include jm/tm. And then I submit a job like ‘

(DISSCUSS) flink cli need load '--classpath' files

2020-03-06 Thread ouywl
Hi all    When I start a flinkcluster in session mode, It include jm/tm. And then I submit a job like ‘bin/flink run —jobmanager “ip:8081” —class path  a.jar’. Even the a.jar in all jm/tm and ‘bin/flink’ mechine . It will throw exception “/opt/flink/bin/flink run --jobmanager ip:

Re: How do I get the value of 99th latency inside an operator?

2020-03-06 Thread Aljoscha Krettek
Hi, I'm afraid you're correct, this is currently not exposed and you would have to hack around some things and/or use reflection. AbstractStreamOperator has a field latencyStats [1], which is what holds the metrics. This is being updated from method reportOrUpdateLatencyMarker [2]. I hope

Re: The parallelism of sink is always 1 in sqlUpdate

2020-03-06 Thread Jingsong Li
Which sink do you use? It depends on sink implementation like [1] [1] https://github.com/apache/flink/blob/2b13a4155fd4284f6092decba867e71eea058043/flink-table/flink-table-api-java-bridge/src/main/java/org/apache/flink/table/sinks/CsvTableSink.java#L147 Best, Jingsong Lee On Fri, Mar 6, 2020 at

Re: How to print the aggregated state everytime it is updated?

2020-03-06 Thread kant kodali
Hi, Thanks for this. so how can I emulate an infinite window while outputting every second? simply put, I want to store the state forever (say years) and since rocksdb is my state backend I am assuming I can state the state until I run out of disk. However I want to see all the updates to the stat

Re: The parallelism of sink is always 1 in sqlUpdate

2020-03-06 Thread faaron zheng
Thanks for you attention. The input of sink is 500, and there is no order by and limit. Jingsong Li 于 2020年3月6日周五 下午6:15写道: > Hi faaron, > > For sink parallelism. > - What is parallelism of the input of sink? The sink parallelism should be > same. > - Does you sql have order by or limit ? > Fli

Re: The parallelism of sink is always 1 in sqlUpdate

2020-03-06 Thread Jingsong Li
Hi faaron, For sink parallelism. - What is parallelism of the input of sink? The sink parallelism should be same. - Does you sql have order by or limit ? Flink batch sql not support range partition now, so it will use single parallelism to run order by. For the memory of taskmanager. There is man

The parallelism of sink is always 1 in sqlUpdate

2020-03-06 Thread faaron zheng
Hi all, I am trying to use flink sql to run hive task. I use tEnv.sqlUpdate to execute my sql which looks like "insert overtwrite ... select ...". But I find the parallelism of sink is always 1, it's intolerable for large data. Why it happens? Otherwise, Is there any guide to decide the memory of

Re: How to print the aggregated state everytime it is updated?

2020-03-06 Thread Congxian Qiu
Hi >From the description, you use window operator, and set to event time. then you should call `DataStream.assignTimestampsAndWatermarks` to set the timestamp and watermark. Window is triggered when the watermark exceed the window end time Best, Congxian kant kodali 于2020年3月4日周三 上午5:11写道: > H

Re: java.time.LocalDateTime in POJO type

2020-03-06 Thread Guowei Ma
Hi KristoffSC, As far as I know, there is no simple API to let you directly use the LocalTimeTypeInfo for LocalDataTime in your POJO class. (maybe other guys know) If the serializer/deserializer of LocalDataTime is very critical for you there might be two methods. 1. Using the StreamExecutionEnv

Re: Backpressure and 99th percentile latency

2020-03-06 Thread Felipe Gutierrez
Thanks for the clarified answer @Zhijiang, I am gonna monitor inputQueueLength and outputQueueLength to check some relation with backpressure. Although I think it is better to use outPoolUsage and inPoolUsage according to [1]. However, in your opinion is it better (faster to see) to use inputQueueL

Re: How to override flink-conf parameters for SQL Client

2020-03-06 Thread Gyula Fóra
I feel that the current configuration section of the environment file assumes too much about what the user wants to configure. Configuring Table specific options is one thing but there are certainly many many cases where users are deploying jobs in a per-job-cluster mode and being able to override