Re: Preserve record orders after WINDOW function

2020-05-11 Thread Jark Wu
Yes, that's right. On Tue, 12 May 2020 at 10:55, Jiahui Jiang wrote: > Thank you for confirming! > > Just want to make sure my understanding of the internal implementation is > correct: > > When applying an over window and ordered by processing time using SQL, the > datastream plan it translates

Re: Broadcast state vs data enrichment

2020-05-11 Thread Manas Kale
Sure. Apologies for not making this clear enough. > each operator only stores what it needs. Lets imagine this setup : BROADCAST STREAM config-stream | |

Re: Preserve record orders after WINDOW function

2020-05-11 Thread Jiahui Jiang
Thank you for confirming! Just want to make sure my understanding of the internal implementation is correct: When applying an over window and ordered by processing time using SQL, the datastream plan it translates into doesn't actually have an order by logic. It just sequentially process all t

Re: Blink Planner fails to generate RowtimeAttribute for custom TableSource

2020-05-11 Thread Jark Wu
Thanks for the investigation, and I think yes, this is a bug and is going to be fixed in FLINK-16160. Best, Jark On Tue, 12 May 2020 at 02:28, Khachatryan Roman wrote: > Hi Yuval, > > Thanks for reporting this issue. I'm pulling in Timo and Jark who are > working on the SQL component. They migh

Re: Flink Memory analyze on AWS EMR

2020-05-11 Thread Xintong Song
Hi Jacky, Could you search for "Application Master start command:" in the debug log and post the result and a few lines before & after that? This is not included in the clip of attached log file. Thank you~ Xintong Song On Tue, May 12, 2020 at 5:33 AM Jacky D wrote: > hi, Robert > > Thanks

Re: Preserve record orders after WINDOW function

2020-05-11 Thread Jark Wu
Hi Jiahui, Yes, if they arrive at the same millisecond, they are perserved in the arriving order. Best, Jark On Mon, 11 May 2020 at 23:17, Jiahui Jiang wrote: > Hello! I'm writing a SQL query with a OVER window function ordered by > processing time. > > I'm wondering since timestamp is only

Re: Flink Memory analyze on AWS EMR

2020-05-11 Thread Jacky D
hi, Robert Thanks so much for quick reply , I changed the log level to debug and attach the log file . Thanks Jacky Robert Metzger 于2020年5月11日周一 下午4:14写道: > Thanks a lot for posting the full output. > > It seems that Flink is passing an invalid list of arguments to the JVM. > Can you > - set

Re: Need suggestion on Flink-Kafka stream processing design

2020-05-11 Thread hemant singh
Hello Roman, PFB my response - As I understand, each protocol has a distinct set of event types (where event type == metrics type); and a distinct set of devices. Is this correct? Yes, correct. distinct events and devices. Each device emits these event. > Based on data protocol I have 4-5 topics

ProducerRecord with Kafka Sink for 1.8.0

2020-05-11 Thread Nick Bendtner
Hi guys, I use 1.8.0 version for flink-connector-kafka. Do you have any recommendations on how to produce a ProducerRecord from a kafka sink. Looking to add support to kafka headers therefore thinking about ProducerRecord. If you have any thoughts its highly appreciated. Best, Nick.

Re: Need suggestion on Flink-Kafka stream processing design

2020-05-11 Thread Khachatryan Roman
Hi Hemant, As I understand, each protocol has a distinct set of event types (where event type == metrics type); and a distinct set of devices. Is this correct? > Based on data protocol I have 4-5 topics. Currently the data for a single event is being pushed to a partition of the kafka topic(produ

Re: Not able to implement an usecase

2020-05-11 Thread Jaswin Shah
If I go with table apis, can I write the streams to hive or it is only for batch processing as of now. Get Outlook for Android From: Khachatryan Roman Sent: Tuesday, May 12, 2020 1:49:10 AM To: Jaswin Shah Cc: user@flink.apache.org Subje

Re: Not able to implement an usecase

2020-05-11 Thread Khachatryan Roman
Hi Jaswin, Currently, DataStream API doesn't support outer joins. As a workaround, you can use coGroup function [1]. Hive is also not supported by DataStream API though it's supported by Table API [2]. [1] https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/api/commo

Re: Flink Memory analyze on AWS EMR

2020-05-11 Thread Robert Metzger
Thanks a lot for posting the full output. It seems that Flink is passing an invalid list of arguments to the JVM. Can you - set the root log level in conf/log4j-yarn-session.properties to DEBUG - then launch the YARN session - share the log file of the yarn session on the mailing list? I'm partic

Re: Flink Memory analyze on AWS EMR

2020-05-11 Thread Jacky D
Hi,Robert Yes , I tried to retrieve more log info from yarn UI , the full logs showing below , this happens when I try to create a flink yarn session on emr when set up jitwatch configuration . 2020-05-11 19:06:09,552 ERROR org.apache.flink.yarn.cli.FlinkYarnSessionCli - Error whi

Re: Flink Memory analyze on AWS EMR

2020-05-11 Thread Robert Metzger
Hey Jacky, The error says "The YARN application unexpectedly switched to state FAILED during deployment.". Have you tried retrieving the YARN application logs? Does the YARN UI / resource manager logs reveal anything on the reason for the deployment to fail? Best, Robert On Mon, May 11, 2020 at

Fwd: Flink Memory analyze on AWS EMR

2020-05-11 Thread Jacky D
-- Forwarded message - 发件人: Jacky D Date: 2020年5月11日周一 下午3:12 Subject: Re: Flink Memory analyze on AWS EMR To: Khachatryan Roman Hi, Roman Thanks for quick response , I tried without logFIle option but failed with same error , I'm currently using flink 1.6 https://ci.apache.org

Re: Flink Memory analyze on AWS EMR

2020-05-11 Thread Khachatryan Roman
Hi Jacky, Did you try it without -XX:LogFile=${FLINK_LOG_PREFIX}.jit ? Probably, Flink can't write to this location. Also, you can try other tools described at https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/application_profiling.html Regards, Roman On Mon, May 11, 2020 at 5

Need suggestion on Flink-Kafka stream processing design

2020-05-11 Thread hemant singh
Hi, I have different events from a device which constitutes different metrics for same device. Each of these event is produced by the device in interval of few milli seconds to a minute. Event1(Device1) -> Stream1 -> Metric 1 Event2 (Device1) -> Stream2 -> Metric 2 ... .. ... Even

Re: Testing jobs locally agains secure Hadoop cluster

2020-05-11 Thread Khachatryan Roman
Hi Őrhidi, Can you please provide some details about the errors you get? Regards, Roman On Mon, May 11, 2020 at 9:32 AM Őrhidi Mátyás wrote: > Dear Community, > > I'm having troubles testing jobs against a secure Hadoop cluster. Is that > possible? The mini cluster seems to not load any secur

Re: Blink Planner fails to generate RowtimeAttribute for custom TableSource

2020-05-11 Thread Khachatryan Roman
Hi Yuval, Thanks for reporting this issue. I'm pulling in Timo and Jark who are working on the SQL component. They might be able to help you with your problem. Regards, Roman On Mon, May 11, 2020 at 9:10 AM Yuval Itzchakov wrote: > Hi, > While migrating from Flink 1.9 -> 1.10 and from the old

Re: Broadcast state vs data enrichment

2020-05-11 Thread Khachatryan Roman
Hi Manas, The approaches you described looks the same: > each operator only stores what it needs. > each downstream operator will "strip off" the config parameter that it needs. Can you please explain the difference? Regards, Roman On Mon, May 11, 2020 at 8:07 AM Manas Kale wrote: > Hi, > I

Re: MongoSink

2020-05-11 Thread Khachatryan Roman
Hi Aissa, What is BSONWritable you pass from map to sink? I guess it's not serializable which causes Flink to use kryo, which fails. Regards, Roman On Sun, May 10, 2020 at 10:42 PM Aissa Elaffani wrote: > Hello Guys, > I am trying to sink my data to MongoDB, But i got some errors. I am > shar

Not able to implement an usecase

2020-05-11 Thread Jaswin Shah
Hi, I want to implement the below use case in my application: I am doing an interval join between two data streams and then, in process function catching up the discrepant results on joining. Joining is done on key orderId. Now, I want to identify all the messages in both datastreams which are n

Preserve record orders after WINDOW function

2020-05-11 Thread Jiahui Jiang
Hello! I'm writing a SQL query with a OVER window function ordered by processing time. I'm wondering since timestamp is only millisecond granularity. For a query using over window and sorted on processing time column, for example, ``` SELECT col1, max(col2) OVER (PARTITION BY col1, ORDER BY

Flink Memory analyze on AWS EMR

2020-05-11 Thread Jacky D
hi, All I'm encounter a memory issue with my flink job on AWS EMR(current flink version 1.6.2) , I would like to find the root cause so I'm trying JITWatch on my local standalone cluster but I can not use it on EMR . after I add following config on flink-conf.yaml : env.java.opts: "-XX:+UnlockDia

Re: Flink REST API side effect?

2020-05-11 Thread Chesnay Schepler
yes, that is correct. On 11/05/2020 14:28, Tomasz Dudziak wrote: Thanks for reply. So do I understand correctly if I say that whenever you query job status it gets cached for a configurable amount of time and subsequent queries within that time period will not show any change? *From:*Chesna

RE: Flink REST API side effect?

2020-05-11 Thread Tomasz Dudziak
Thanks for reply. So do I understand correctly if I say that whenever you query job status it gets cached for a configurable amount of time and subsequent queries within that time period will not show any change? From: Chesnay Schepler Sent: 11 May 2020 13:20 To: Tomasz Dudziak ; user@flink.apa

Re: Flink REST API side effect?

2020-05-11 Thread Chesnay Schepler
This is expected, the backing data structure is cached for a while so we never hammer the JobManager with requests. IIRC this is controlled via "web.refresh-interval", with the default being 3 seconds. On 11/05/2020 14:10, Tomasz Dudziak wrote: Hi, I found an interesting behaviour of the R

Flink REST API side effect?

2020-05-11 Thread Tomasz Dudziak
Hi, I found an interesting behaviour of the REST API in my automated system tests using that API where I was getting status of a purposefully failing job. If you query job details immediately after job submission, subsequent queries will return its status as RUNNING for a moment until Flink's r

Re: Assertion failed: (last_ref), function ~ColumnFamilySet, file db/column_family.cc, line 1238

2020-05-11 Thread Tzu-Li (Gordon) Tai
In that case, the most possible cause would be https://issues.apache.org/jira/browse/FLINK-16313, which is included in Flink 1.10.1 (to be released) The release candidates for Flink 1.10.1 is currently ongoing, would it be possible for you to try that out and see if the error still occurs? On Mon

Re: Assertion failed: (last_ref), function ~ColumnFamilySet, file db/column_family.cc, line 1238

2020-05-11 Thread luisfaamaral
Thanks Gordon and Seth for the reply. So.. the main project contains the below flink dependencies... And the state processor project contains the following: 1.9.0 At the first sight I may say all the libraries match to 1.9.0 flink libraries within both projects. -- Sent from: http://apach

Testing jobs locally agains secure Hadoop cluster

2020-05-11 Thread Őrhidi Mátyás
Dear Community, I'm having troubles testing jobs against a secure Hadoop cluster. Is that possible? The mini cluster seems to not load any security modules. Thanks, Matyas

Blink Planner fails to generate RowtimeAttribute for custom TableSource

2020-05-11 Thread Yuval Itzchakov
Hi, While migrating from Flink 1.9 -> 1.10 and from the old planner to Blink, I've ran into an issue where the Blink Planner doesn't take into account the RowtimeAttribute defined on a custom table source. I've opened an issue: https://issues.apache.org/jira/browse/FLINK-17600 and was wondering if