Re: checkpoint/taskmanager is stuck, deadlock on LocalBufferPool

2018-10-22 Thread Yan Zhou [FDS Science]
I am using flink 1.5.3 From: Yan Zhou [FDS Science] Sent: Monday, October 22, 2018 11:26 To: user@flink.apache.org Subject: checkpoint/taskmanager is stuck, deadlock on LocalBufferPool Hi, My application suddenly stuck and completely doesn't move forward

checkpoint/taskmanager is stuck, deadlock on LocalBufferPool

2018-10-22 Thread Yan Zhou [FDS Science]
Hi, My application suddenly stuck and completely doesn't move forward after running for a few days. No exceptions are found. From the thread dump, I can see that the operator threads and checkpoint threads deadlock on LocalBufferPool. LocalBufferPool is not able to request memory and keep the lo

Re: Taskmanager process memory increasing always

2018-09-04 Thread Yan Zhou [FDS Science]
I have met similar issue. Yarn kills the TaskManagers, as their memory usage grows to the limit. I think it might be rocksdb causing the problem. Is there any way to debug the memory usage of rocksdb backend? Best Yan From: YennieChen88 Sent: Wednesday, Augus

Re: checkpoint recovery behavior when kafka source is set to start from timestamp

2018-08-07 Thread Yan Zhou [FDS Science]
Thank you Vino. It is very helpful. From: vino yang Sent: Tuesday, August 7, 2018 7:22:50 PM To: Yan Zhou [FDS Science] Cc: user Subject: Re: checkpoint recovery behavior when kafka source is set to start from timestamp Hi Yan Zhou: I think the java doc of the

checkpoint recovery behavior when kafka source is set to start from timestamp

2018-08-07 Thread Yan Zhou [FDS Science]
Hi Experts, In my application, the kafka source is set to start from a specified timestamp, by calling method FlinkKafkaConsumer010#setStartFromTimestamp(long startupOffsetsTimestamp). If the application have run a while and then recover from a checkpoint because of failure, what's the offse

Re: [DISCUSS] Flink 1.6 features

2018-06-05 Thread Yan Zhou [FDS Science]
+1 on https://issues.apache.org/jira/browse/FLINK-5479 [FLINK-5479] Per-partition watermarks in ... issues.apache.org Reported in ML: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kafka-topic-partition-skewness-causes-waterm

Re: NPE in flink sql over-window

2018-06-04 Thread Yan Zhou [FDS Science]
] [label:state_timeout] ontimer at 1528150685813(CLEANUP_TIME_2), clean/empty the rowMapState [{key:1528149885813, value:[Row:(orderId:002,userId:U123)]}] [1] : https://issues.apache.org/jira/browse/FLINK-9524 From: Yan Zhou [FDS Science] Sent: Monday, June 4

Re: NPE in flink sql over-window

2018-06-04 Thread Yan Zhou [FDS Science]
: Yan Zhou [FDS Science] Cc: Dawid Wysakowicz; user Subject: Re: NPE in flink sql over-window Hi Yan, Thanks for the details and for digging into the issue. If I got it right, the NPE caused the job failure and recovery (instead of being the result of a recovery), correct? Best, Fabian 2018-05-31

Re: NPE in flink sql over-window

2018-05-30 Thread Yan Zhou [FDS Science]
file a JIRA for that? Best, Dawid On 30/05/18 08:27, Yan Zhou [FDS Science] wrote: I also get warnning that CodeCache is full around that time. It's printed by JVM and doesn't have timestamp. But I suspect that it's because so many failure recoveries from checkpoint and the sql q

Re: NPE in flink sql over-window

2018-05-29 Thread Yan Zhou [FDS Science]
0007fa50c00] total_blobs=54308 nmethods=53551 adapters=617 compilation: disabled (not enough contiguous free space left) ____ From: Yan Zhou [FDS Science] Sent: Tuesday, May 29, 2018 10:52:18 PM To: user@flink.apache.org Subject: NPE in flink sql over-window Hi, I am

Re: Task did not exit gracefully and lost TaskManager

2018-05-29 Thread Yan Zhou [FDS Science]
doop-prod04-mp:35187/user/taskmanager_0. 2018-05-29 14:43:34,669 INFO org.apache.flink.runtime.state.TaskExecutorLocalStateStoresManager - Shutting down TaskExecutorLocalStateStoresManager. ____________ From: Yan Zhou [FDS Science] Sent: Tuesday, May 29, 2018 11:04:52 PM

Task did not exit gracefully and lost TaskManager

2018-05-29 Thread Yan Zhou [FDS Science]
Hi, when I stop my flink application in standalone cluster, one of the tasks can NOT exit gracefully. And the task managers are lost(or detached?). I can't see them in the web ui. However, the task managers are still running in the slave servers. What could be the possible cause? My applicati

NPE in flink sql over-window

2018-05-29 Thread Yan Zhou [FDS Science]
Hi, I am using flink sql 1.5.0. My application throws NPE. And after it recover from checkpoint automatically, it throws NPE immediately from same line of code. My application read message from kafka, convert the datastream into a table, issue an Over-window aggregation and write the result in

Re: increasing parallelism increases the end2end latency in flink sql

2018-05-23 Thread Yan Zhou [FDS Science]
The BoundedOutOfOrdernessTimestampExtractor is assigned to datastream after kafka consumer. The graph is like: KafkaSource-> map2Pojo -> BoundedOutOfOrdernessTimestampExtractor -> Table -> .. ________ From: Yan Zhou [FDS Science] Sent: Wednesday, Ma

Re: How to restore state from savepoint with flink SQL

2018-05-23 Thread Yan Zhou [FDS Science]
Best Yan From: Fabian Hueske Sent: Wednesday, May 23, 2018 3:18:08 AM To: Yan Zhou [FDS Science] Cc: user@flink.apache.org Subject: Re: How to restore state from savepoint with flink SQL Hi, At the moment, you can only restore a query from a savepoint if the que

increasing parallelism increases the end2end latency in flink sql

2018-05-23 Thread Yan Zhou [FDS Science]
Hi, My application assigned timestamp to kafka event with BoundedOutOfOrdernessTimestampExtractor then converted them to a table. Finally flink SQL over-window aggregation is run against the table. When I double the parallelism of my flink application, the end2end latency is doubled. What c

How to restore state from savepoint with flink SQL

2018-05-22 Thread Yan Zhou [FDS Science]
Hi, My application use flink SQL and it's running in production. How can i update my application with topology changes yet doesn't lose the state data? Is there a way to assign UID to the operators that are translated from SQL? If not, is it intended and whats the rationality behind it? Acco

Re: why doesn't the over-window-aggregation sort the element(considering watermark) before processing?

2018-04-18 Thread Yan Zhou [FDS Science]
nvm, I figure it out. The event is not process once it's arrived. It's registered to processed in event time. It make sense. best Yan ____ From: Yan Zhou [FDS Science] Sent: Wednesday, April 18, 2018 12:56:58 PM To: Fabian Hueske Cc: user Subject: Re: w

Re: why doesn't the over-window-aggregation sort the element(considering watermark) before processing?

2018-04-18 Thread Yan Zhou [FDS Science]
avoid that? Looking forward to hear your reply. Best Yan From: Fabian Hueske Sent: Wednesday, April 18, 2018 1:04:43 AM To: Yan Zhou [FDS Science] Cc: user Subject: Re: why doesn't the over-window-aggregation sort the element(considering watermark) before p

why doesn't the over-window-aggregation sort the element(considering watermark) before processing?

2018-04-17 Thread Yan Zhou [FDS Science]
Hi, I use bounded over-window aggregation in my application. However, sometimes some input elements are "discarded" and not generating output. By reading the source code of RowTimeBoundedRangeOver.scala, I realize the record is actually discarded if it is out of order. Please see the quoted c

Re: flink sql: "slow" performance on Over Window aggregation with time attribute set to event time

2018-03-15 Thread Yan Zhou [FDS Science]
: Thursday, March 15, 2018 11:55:12 AM To: Yan Zhou [FDS Science] Cc: user@flink.apache.org Subject: Re: flink sql: "slow" performance on Over Window aggregation with time attribute set to event time I see... Another issue might be the frequency with which you emit watermarks (in case you use

Re: flink sql: "slow" performance on Over Window aggregation with time attribute set to event time

2018-03-15 Thread Yan Zhou [FDS Science]
stand. Best Yan From: Fabian Hueske Sent: Wednesday, March 14, 2018 12:02:01 PM To: Yan Zhou [FDS Science] Cc: user@flink.apache.org Subject: Re: flink sql: "slow" performance on Over Window aggregation with time attribute set to event time Hi, It is

Re: flink sql: "slow" performance on Over Window aggregation with time attribute set to event time

2018-03-14 Thread Yan Zhou [FDS Science]
lease-1.4/dev/event_timestamp_extractors.html#assigners-allowing-a-fixed-amount-of-lateness> is used. So even the slowest source task is not that slow. Best Yan From: Fabian Hueske Sent: Wednesday, March 14, 2018 3:28 AM To: Yan Zhou [FDS Science] Cc: user@flink.apache

flink sql: "slow" performance on Over Window aggregation with time attribute set to event time

2018-03-13 Thread Yan Zhou [FDS Science]
Hi, I am using flink sql in my application. It simply reads records from kafka source, converts to table, then runs an query to have over window aggregation for each record. Time lag watermark assigner with 10ms time lag is used. The performance is not ideal. the end-to-end latency, which is th

Re: flink sql timed-window join throw "mismatched type" AssertionError on rowtime column

2018-03-08 Thread Yan Zhou [FDS Science]
From: Xingcan Cui Sent: Thursday, March 8, 2018 8:21:42 AM To: Timo Walther Cc: user; Yan Zhou [FDS Science] Subject: Re: flink sql timed-window join throw "mismatched type" AssertionError on rowtime column Hi Yan & Timo, this is confirmed to be a bug and I’ve created an

Re: flink sql timed-window join throw "mismatched type" AssertionError on rowtime column

2018-03-07 Thread Yan Zhou [FDS Science]
tion by id order by eventTs rows between 100 preceding and current row) as cnt1 from myTable"; Best Yan From: xccui-foxmail Sent: Wednesday, March 7, 2018 8:10 PM To: Yan Zhou [FDS Science] Cc: user@flink.apache.org Subject: Re: flink sql timed-window join th

flink sql timed-window join throw "mismatched type" AssertionError on rowtime column

2018-03-07 Thread Yan Zhou [FDS Science]
Hi experts, I am using flink table api to join two tables, which are datastream underneath. However, I got an assertion error of "java.lang.AssertionError: mismatched type $1 TIMESTAMP(3)" on rowtime column. Below is more details: There in only one kafka data source, which is then converted to

is it possible to convert "retract" datastream to table

2018-01-09 Thread Yan Zhou [FDS Science]
Hi, There are APIs to convert a dynamic table to retract stream. Is there any way to construct a "retract" data stream and convert it into table? I want to read the change log of relational database from kafka, "apply" the changes within flink( by creating CRow DataStream), register/create a t

Re: how does time-windowed join and Over-Window Aggregation implemented in flink SQL?

2017-12-14 Thread Yan Zhou [FDS Science]
Thanks for the information. Best Yan From: Xingcan Cui Date: Wednesday, December 13, 2017 at 6:02 PM To: "Yan Zhou [FDS Science]" Cc: "user@flink.apache.org" Subject: Re: how does time-windowed join and Over-Window Aggregation implemented in flink SQL? Hi Yan Zhou, as

how does time-windowed join and Over-Window Aggregation implemented in flink SQL?

2017-12-13 Thread Yan Zhou [FDS Science]
Hi, I am building a data pipeline with a lot of streaming join and over window aggregation. And flink SQL have these feature supported. However, there is no similar DataStream APIs provided(maybe there is and I didn't find them. please point out if there is). I got confused because I assume th

Re: DataStream joining without window

2017-10-11 Thread Yan Zhou [FDS Science] ­
ments using the Flink state interfaces > you can switch the state backend to the RocksDB backend and if you have > concerns about the state growing too big. > > Best, > Aljoscha > > > On 9. Oct 2017, at 23:11, Yan Zhou [FDS Science] ­ > wrote: > > > > It seems lik

DataStream joining without window

2017-10-09 Thread Yan Zhou [FDS Science] ­
It seems like flink only supports DataStream joining within same time window. Why is it restricted in this way? I think I can implement a TwoInputStreamOperator to join two DataStreams without considering the window. And inside the operator, create two state to cache records of two streams and jo

Re: How to clear registered timers for a merged window?

2017-09-26 Thread Yan Zhou [FDS Science] ­
id not > trigger, because currently some of the data structures used for timers do > not support random deletes efficiently. For the second part of the question > about keeping the state of merged windows, I added Aljoscha in CC who might > provide more information about the topic. >

How to clear registered timers for a merged window?

2017-09-25 Thread Yan Zhou [FDS Science] ­
Hi, I am implementing a merge-able trigger, and having a problem in clearing the registered timers for a merged window (a window has been merged into the merging result). For my implementation, the trigger registers multiple timers for each element at Trigger#onElement(). State is used to keep tra