Re: Monitor number of keys per Taskmanager

2019-10-25 Thread Flavio Pompermaier
Thnk you all for the reply. Maybe I could set up some metrics and count the keys per subtasks/slot by myself. However in the example of the playground there are 6 keys and they get distributed in the 2 slots as 4 and 2: is this a bug (since Piotr said that key groups can have sizes +/- 1 and in thi

Re: The RMClient's and YarnResourceManagers internal state about the number of pending container requests has diverged

2019-10-25 Thread Till Rohrmann
Great, thanks a lot Regina. I'll check the logs tomorrow. If info level is not enough, then I'll let you know. Cheers, Till On Fri, Oct 25, 2019, 21:20 Chan, Regina wrote: > Till, I added you to this lockbox area where you should be able to > download the logs. You should have also received an

Re: The RMClient's and YarnResourceManagers internal state about the number of pending container requests has diverged

2019-10-25 Thread Till Rohrmann
Could you provide me with the full logs of the cluster entrypoint/JobManager. I'd like to see what's going on there. Cheers, Till On Fri, Oct 25, 2019, 19:10 Chan, Regina wrote: > Till, > > > > We’re still seeing a large number of returned containers even with this > heart beat set to something

RE: The RMClient's and YarnResourceManagers internal state about the number of pending container requests has diverged

2019-10-25 Thread Chan, Regina
Till, We’re still seeing a large number of returned containers even with this heart beat set to something higher. Do you have hints as to what’s going on? It seems to be bursty in nature. The bursty requests cause the job to fail with the cluster not having enough resources because it’s in the

Re: Guarantee of event-time order in FlinkKafkaConsumer

2019-10-25 Thread Fabian Hueske
Hi Wojciech, I posted an answer on StackOverflow. Best, Fabian Am Do., 24. Okt. 2019 um 13:03 Uhr schrieb Wojciech Indyk < wojciechin...@gmail.com>: > Hi! > I use Flink 1.8.0 with Kafka 2.2.1. I need to guarantee of correct order > of events by event timestamp. I generate periodic watermarks ev

Re: Flink 1.5+ performance in a Java standalone environment

2019-10-25 Thread Fabian Hueske
Hi Jakub, I had a look at the changes of Flink 1.5 [1] and didn't find anything obvious. Something that might cause a different behavior is the new deployment and process model (FLIP-6). In Flink 1.5, there is a switch to disable it and use the previous deployment mechanism. You could try to disa

Re: Using STSAssumeRoleSessionCredentialsProvider for cross account access

2019-10-25 Thread Fabian Hueske
Hi Vinay, Maybe Gordon (in CC) has an idea about this issue. Best, Fabian Am Do., 24. Okt. 2019 um 14:50 Uhr schrieb Vinay Patil < vinay18.pa...@gmail.com>: > Hi, > > Can someone pls help here , facing issues in Prod . I see the following > ticket in unresolved state. > > https://issues.apache.

Re: Can a Flink query outputs nested json?

2019-10-25 Thread Fabian Hueske
Hi, I did not understand what you are trying to achieve. Which field of the input table do you want to write to the output table? Flink SQL> insert into nestedSink select nested from nestedJsonStream; [INFO] Submitting SQL update statement to the cluster... [ERROR] Could not execute SQL statement

Re: JDBCInputFormat does not support json type

2019-10-25 Thread Fabian Hueske
Hi Fanbin, One approach would be to ingest the field as a VARCHAR / String and implement a Scalar UDF to convert it into a nested tuple. The UDF could use the code of the flink-json module. AFAIK, there is some work on the way to add built-in JSON functions. Best, Fabian Am Do., 24. Okt. 2019 u

Re: Flink 1.9 measuring time taken by each operator in DataStream API

2019-10-25 Thread Fabian Hueske
Hi Komal, Measuring latency is always a challenge. The problem here is that your functions are chained, meaning that the result of a function is directly passed on to the next function and only when the last function emits the result, the first function is called with a new record. This makes meas

Re: Issue with writeAsText() to S3 bucket

2019-10-25 Thread Fabian Hueske
Hi Michael, One reason might be that S3's file listing command is only eventually consistent. It might take some time until the file appears and is listed. Best, Fabian Am Mi., 23. Okt. 2019 um 22:41 Uhr schrieb Nguyen, Michael < michael.nguye...@t-mobile.com>: > Hello all, > > > > I am running

Re: [Problem] Unable to do join on TumblingEventTimeWindows using SQL

2019-10-25 Thread Fabian Hueske
Hi, the exception says: "Rowtime attributes must not be in the input rows of a regular join. As a workaround you can cast the time attributes of input tables to TIMESTAMP before.". The problem is that your query first joins the two tables without a temporal condition and then wants to do a window

RE: Does operator uid() have to be unique across all jobs?

2019-10-25 Thread min.tan
Thank you for your reply. Any tool enables us to inspect (list) statically all the "uid"ed operators or all the operators? for a jar? Also addSink and addSource are not on the operator list https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/stream/operators/ But they both have an

Tumbling tables in the SQL API

2019-10-25 Thread A. V.
Hi, In the SQL API I see the query below. I want to know how I can make tumbling tables based on amount of rows. So I want to make a window for row 1-10, 11-20 etc. It is also good if the windowing takes place on a Integer ID column. How can I do this? Table result1 = tableEnv.sqlQuery( "SE

Re: Can flink 1.9.1 use flink-shaded 2.8.3-1.8.2

2019-10-25 Thread Chesnay Schepler
Simple, you pick the version that is listed on the download page for the Flink version you are using. We have not done any tests as to whether hadoop 2.8.3 works with hadoop 2.8.5 . On 25/10/2019 10:36, Jeff Zhang wrote: Thanks Chesnay, is there any document to explain which version of flink

Re: Can flink 1.9.1 use flink-shaded 2.8.3-1.8.2

2019-10-25 Thread Jeff Zhang
Thanks Chesnay, is there any document to explain which version of flink-shaded-hadoop-jar should I use for specific version of flink ? e.g. The document of flink 1.9 here https://flink.apache.org/downloads.html#apache-flink-191 point me to flink-shaded-hadoop-jar 7.0, but the latest version of flin

Re: Does operator uid() have to be unique across all jobs?

2019-10-25 Thread Dian Fu
It means that there is an operator state which has no corresponding operator in the new job. It usually indicates that the uid of a stateful operator has changed. > 在 2019年10月25日,下午4:12, 写道: > > Thanks for your reply. > > Our sources and sinks are connected to Kafka, therefore they are statf

Re: Can flink 1.9.1 use flink-shaded 2.8.3-1.8.2

2019-10-25 Thread Chesnay Schepler
If you need hadoop, but the approach outlined here doesn't work for you, then you still need a flink-shaded-hadoop-jar that you can download here

RE: Does operator uid() have to be unique across all jobs?

2019-10-25 Thread min.tan
Thanks for your reply. Our sources and sinks are connected to Kafka, therefore they are statful. We did not set uid on them but only name(). The log says Caused by: java.lang.IllegalStateException: Failed to rollback to checkpoint/savepoint file:/var/flink/data-remote/savepoint-00-dae01410

Re: Does operator uid() have to be unique across all jobs?

2019-10-25 Thread Dian Fu
Hi Min, It depends on the source/sink implementation. If the source/sink implementation uses state, uid should be set. So you can always set the uid in this case and then you don't need to care about the implementation details of the source/sink you used. name() doesn't have such functionality

Re: Can flink 1.9.1 use flink-shaded 2.8.3-1.8.2

2019-10-25 Thread vino yang
Hi Jeff, Maybe @Chesnay Schepler could tell you the answer. Best, Vino Jeff Zhang 于2019年10月25日周五 下午3:54写道: > Hi all, > > There's no new flink shaded release for flink 1.9, so I'd like to confirm > with you can flink 1.9.1 use flink-shaded 2.8.3-1.8.2 ? Or the flink shaded > is not necessary f

Re: How to create an empty test stream

2019-10-25 Thread vino yang
Yes, this is also a good idea if you don't ask for this stream to be empty from the source. Best, Dmitry Dmitry Minaev 于2019年10月25日周五 下午12:21写道: > Thanks, I'll check it out. > Actually I realized I can always put a filter operator that'll effectively > remove everything from the stream. > > --

Can flink 1.9.1 use flink-shaded 2.8.3-1.8.2

2019-10-25 Thread Jeff Zhang
Hi all, There's no new flink shaded release for flink 1.9, so I'd like to confirm with you can flink 1.9.1 use flink-shaded 2.8.3-1.8.2 ? Or the flink shaded is not necessary for flink 1.9 afterwards ? https://mvnrepository.com/artifact/org.apache.flink/flink-shaded-hadoop2 -- Best Regards Jef

RE: Does operator uid() have to be unique across all jobs?

2019-10-25 Thread min.tan
Thank you very much for your helpful response. Our new production release complains about the an uid mismatch (we use exactly once checkpoints). I hope I understand your correctly: map and print are certainly stateless, therefore no uid is required. What about addSink and addSoure? Do they need

RE: Could not load the native RocksDB library

2019-10-25 Thread Patro, Samya
Hello Thad, In my case , the issue was fixed after upgrading the os version , and gcc version. From: Thad Truman [mailto:ttru...@neovest.com] Sent: Tuesday, October 22, 2019 11:03 PM To: Andrey Zagrebin; Haibo Sun Cc: Patro, Samya [Engineering]; user@flink.apache.org; Bari, Swapnil [Engineering