Re: Adaptive load balancing

2020-09-23 Thread Zhijiang
custom partitioner which can control the logic of keyBy distribution based on pre-defined cache distribution in nodes? Best, Zhijiang -- From:Navneeth Krishnan Send Time:2020年9月23日(星期三) 02:21 To:user Subject:Adaptive load balancing

Re: Re: [ANNOUNCE] New PMC member: Dian Fu

2020-08-27 Thread Zhijiang
Congrats, Dian! -- From:Yun Gao Send Time:2020年8月27日(星期四) 17:44 To:dev ; Dian Fu ; user ; user-zh Subject:Re: Re: [ANNOUNCE] New PMC member: Dian Fu Congratulations Dian ! Best Yun ---

Re: [ANNOUNCE] Apache Flink 1.10.2 released

2020-08-27 Thread Zhijiang
Congrats, thanks for the release manager work Zhu Zhu and everyone involved in! Best, Zhijiang -- From:liupengcheng Send Time:2020年8月26日(星期三) 19:37 To:dev ; Xingbo Huang Cc:Guowei Ma ; user-zh ; Yangze Guo ; Dian Fu ; Zhu Zhu

Re: [ANNOUNCE] Apache Flink 1.11.1 released

2020-07-22 Thread Zhijiang
Thanks for being the release manager and the efficient work, Dian! Best, Zhijiang -- From:Konstantin Knauf Send Time:2020年7月22日(星期三) 19:55 To:Till Rohrmann Cc:dev ; Yangze Guo ; Dian Fu ; user ; user-zh Subject:Re: [ANNOUNCE

[ANNOUNCE] Apache Flink 1.11.0 released

2020-07-07 Thread Zhijiang
://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12346364 We would like to thank all contributors of the Apache Flink community who made this release possible! Cheers, Piotr & Zhijiang

Re: Unaligned Checkpoint and Exactly Once

2020-06-21 Thread Zhijiang
:2020年6月22日(星期一) 10:53 To:Zhijiang ; user@flink.apache.org Subject:回复: Unaligned Checkpoint and Exactly Once Thank you Zhijiang! The second question about config is just because I find a method in InputProcessorUtil. I guess AT_LEAST_ONCE mode is a simpler way to handle checkpont barrier

Re: Unaligned Checkpoint and Exactly Once

2020-06-21 Thread Zhijiang
using the first config form. But somehow they seem two different dimensions for config the checkpoint. One is for the semantic of data processing guarantee. And the other is for how we realize two different mechanisms to guarantee one (exactly-once) of the semantics. Best, Zhijiang

Re: [ANNOUNCE] Yu Li is now part of the Flink PMC

2020-06-16 Thread Zhijiang
Congratulations Yu! Well deserved! Best, Zhijiang -- From:Dian Fu Send Time:2020年6月17日(星期三) 10:48 To:dev Cc:Haibo Sun ; user ; user-zh Subject:Re: [ANNOUNCE] Yu Li is now part of the Flink PMC Congrats Yu! Regards, Dian >

Re: Blocked requesting MemorySegment when Segments are available.

2020-06-10 Thread Zhijiang
n" operator is more than 0? I want to get ride of the factors of buffer leak on upstream side and without partition request on downstream side. Then we can further allocate whether the input availability notification on downstream s

Re: Flink 1.10 memory and backpressure

2020-06-10 Thread Zhijiang
Sorry for missing the document link [1] https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/back_pressure.html -- From:Zhijiang Send Time:2020年6月11日(星期四) 11:32 To:Steven Nelson ; user Subject:Re: Flink 1.10 mem

Re: Flink 1.10 memory and backpressure

2020-06-10 Thread Zhijiang
Regarding the monitor of backpressure, you can refer to the document [1]. As for debugging the backpressure, one option is to trace the jstack of respective window task thread which causes the backpressure(almost has the maximum inqueue buffers). After frequent tracing the jstack, you might find

Re: Singal task backpressure problem with Credit-based Flow Control

2020-05-24 Thread Zhijiang
am not sure why you choose rescale to shuffle data among operators. The default forward mode can gain really good performance by default if you adjusting the same parallelism among them. Best, Zhijiang -- From:Weihua Hu Send Time

Re: [ANNOUNCE] Apache Flink 1.10.1 released

2020-05-18 Thread Zhijiang
Thanks Yu for the release manager and everyone involved in. Best, Zhijiang -- From:Arvid Heise Send Time:2020年5月18日(星期一) 23:17 To:Yangze Guo Cc:dev ; Apache Announce List ; user ; Yu Li ; user-zh Subject:Re: [ANNOUNCE] Apache

Re: Flink Streaming Job Tuning help

2020-05-12 Thread Zhijiang
work related metrics. Whether there are data skew in your case, that means some task would process more records than others. If so, maybe we can increase the parallelism to balance the load. Best, Zhijiang -- From:Senthil Kumar Send

Re: [ANNOUNCE] Apache Flink 1.9.3 released

2020-04-27 Thread Zhijiang
Thanks Dian for the release work and thanks everyone involved. Best, Zhijiang -- From:Till Rohrmann Send Time:2020 Apr. 27 (Mon.) 15:13 To:Jingsong Li Cc:dev ; Leonard Xu ; Benchao Li ; Konstantin Knauf ; jincheng sun ; Hequn

Re: [ANNOUNCE] Apache Flink Stateful Functions 2.0.0 released

2020-04-09 Thread Zhijiang
Great work! Thanks Gordon for the continuous efforts for enhancing stateful functions and the efficient release! Wish stateful functions becoming more and more popular in users. Best, Zhijiang -- From:Yun Tang Send Time:2020 Apr

Re: [ANNOUNCE] Flink on Zeppelin (Zeppelin 0.9 is released)

2020-03-30 Thread Zhijiang
Thanks for the continuous efforts for engaging in Flink ecosystem Jeff! Glad to see the progressive achievement. Wish more users try it out in practice. Best, Zhijiang -- From:Dian Fu Send Time:2020 Mar. 31 (Tue.) 10:15 To:Jeff

Re: End to End Latency Tracking in flink

2020-03-29 Thread Zhijiang
Hi Lu, Besides Congxian's replies, you can also get some further explanations from "https://flink.apache.org/2019/07/23/flink-network-stack-2.html#latency-tracking";. Best, Zhijiang -- From:Congxian Qiu Send Ti

Re: FlinkCEP - Detect absence of a certain event

2020-03-18 Thread Zhijiang
Hi Humberto, I guess Fuji is familiar with Flink CEP and he can answer your proposed question. I already cc him. Best, Zhijiang -- From:Humberto Rodriguez Avila Send Time:2020 Mar. 18 (Wed.) 17:31 To:user Subject:FlinkCEP

Re: How do I get the outPoolUsage value inside my own stream operator?

2020-03-18 Thread Zhijiang
skMetricGroup. Hope it solve your problem. Best, Zhijiang -- From:Felipe Gutierrez Send Time:2020 Mar. 17 (Tue.) 17:50 To:user Subject:Re: How do I get the outPoolUsage value inside my own stream operator? Hi, just for the recor

Re: Help me understand this Exception

2020-03-18 Thread Zhijiang
e from the wrapped real exception, then users can easily get the root cause directly, not only for the current message "Could not forward element to next operator". Best, Zhijiang -- From:Tzu-Li (Gordon) Tai Send Time:20

Re: Backpressure and 99th percentile latency

2020-03-07 Thread Zhijiang
be increased along with the trend of increased input&outputQueueLength and input&outputPoolUsage. All of them should be proportional to have the same trend in most cases. Best, Zhijiang -- From:Felipe Gutierrez Send

Re: Backpressure and 99th percentile latency

2020-03-05 Thread Zhijiang
ssure, it should go down to milliseconds. Best, Zhijiang -- From:Felipe Gutierrez Send Time:2020 Mar. 6 (Fri.) 05:04 To:user Subject:Backpressure and 99th percentile latency Hi, I am a bit confused about the topic of tracking la

Re: Question: Determining Total Recovery Time

2020-02-26 Thread Zhijiang
metrics are `0` in light-weight situation, which i mentioned above. So we can not estimate the saturation unless we increase the source emit. Wish good news sharing from you! Best, Zhijiang -- From:Arvid Heise Send Time:2020 Feb

Re: How JobManager and TaskManager find each other?

2020-02-26 Thread Zhijiang
ld be sent to the required peers during task schedule and deployment. Best, Zhijiang -- From:KristoffSC Send Time:2020 Feb. 26 (Wed.) 19:39 To:user Subject:Re: How JobManager and TaskManager find each other? Thanks all for the an

Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer

2020-02-20 Thread Zhijiang
Congrats Jingsong! Welcome on board! Best, Zhijiang -- From:Zhenghua Gao Send Time:2020 Feb. 21 (Fri.) 12:49 To:godfrey he Cc:dev ; user Subject:Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer Congrats Jingsong! Best

Re: Re: The mapping relationship between Checkpoint subtask id and Task subtask id

2020-02-13 Thread Zhijiang
BTW, the FLIP-75 is going for the user experience of web UI. @Yadong Xiehave we already considered this issue to unify the ids in different parts in FLIP-75? Best, Zhijiang -- From:Zhijiang Send Time:2020 Feb. 14 (Fri.) 13:03

Re: Re: The mapping relationship between Checkpoint subtask id and Task subtask id

2020-02-13 Thread Zhijiang
Let's move the further discussion onto the jira page. I have not much time recently for working on this. If you want to take it, I can assign it to you and help review the PR if have time then. Or I can find other possible guys work on it future. Best, Zhi

Re: The mapping relationship between Checkpoint subtask id and Task subtask id

2020-02-13 Thread Zhijiang
If the id is not consistent in different parts, maybe it is worth creating a jira ticket for better improving the user experience. If anyone wants to work on it, please ping me then I can give a hand. Best, Zhijiang -- From:Yun Tang

Re: Encountered error while consuming partitions

2020-02-13 Thread Zhijiang
Best, Zhijiang -- From:张光辉 Send Time:2020 Feb. 12 (Wed.) 22:19 To:Benchao Li Cc:刘建刚 ; user Subject:Re: Encountered error while consuming partitions Network can fail in many ways, sometimes pretty subtle (e.g. high ratio packet

Re: [ANNOUNCE] Apache Flink 1.10.0 released

2020-02-12 Thread Zhijiang
Really great work and thanks everyone involved, especially for the release managers! Best, Zhijiang -- From:Kurt Young Send Time:2020 Feb. 13 (Thu.) 11:06 To:[None] Cc:user ; dev Subject:Re: [ANNOUNCE] Apache Flink 1.10.0 released

Re: Exactly once semantics for hdfs sink

2020-02-11 Thread Zhijiang
Hi Vishwas, I guess this link [1] can help understand how it works and how to use in practice for StreamingFileSink. [1] https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/connectors/streamfile_sink.html Best, Zhijiang

Re: How can I find out which key group belongs to which subtask

2020-01-10 Thread Zhijiang
implemented some enhancements in scheduler layer to support such requirement in release-1.10. You can have a try when the rc candidate is ready. Best, Zhijiang -- From:杨东晓 Send Time:2020 Jan. 10 (Fri.) 02:10 To:Congxian Qiu Cc:user

Re: How to verify if checkpoints are asynchronous or sync

2020-01-07 Thread Zhijiang
rming whether it is happening, but maybe it is not very convenient. Another possible way is via the checkpoint metrics which would record the sync/async duration time, maybe it can also satisfy your requirements. Best, Zhi

Re: Aggregating Movie Rental information in a DynamoDB table using DynamoDB streams and Flink

2019-12-25 Thread Zhijiang
addition, the StreamingFileSink also implements the exactly-once for sink. You might also refer to it to get some insights if possible. Best, Zhijiang -- From:Joe Hansen Send Time:2019 Dec. 26 (Thu.) 01:42 To:user Subject:Aggregating

Re: Rewind offset to a previous position and ensure certainty.

2019-12-25 Thread Zhijiang
not support such operation/function atm. :) [1] https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/event_timestamps_watermarks.html Best, Zhijiang -- From:邢瑞斌 Send Time:2019 Dec. 25 (Wed.) 20:27 To:user-zh ; user

Re: Flink task node shut it self off.

2019-12-24 Thread Zhijiang
. -- From:John Smith Send Time:2019 Dec. 25 (Wed.) 03:40 To:Zhijiang Cc:user Subject:Re: Flink task node shut it self off. The shutdown happened after the massive IO wait. I don't use any state Checkpoints are disk based... On Mon., Dec. 23, 2019, 1:42 a.m. Zhijiang, wrote: Hi

Re: Flink task node shut it self off.

2019-12-22 Thread Zhijiang
increase your task manager memory. But if you can analyze the dump hs_err file via some profiler tool for checking the memory usage, it might be more helpful to find the root cause. Best, Zhijiang -- From:John Smith Send Time:2019 Dec

Re: CoGroup SortMerger performance degradation from 1.6.4 - 1.9.1?

2019-11-21 Thread Zhijiang
. // ah From: Piotr Nowojski On Behalf Of Piotr Nowojski Sent: Thursday, November 21, 2019 10:14 AM To: Hailu, Andreas [Engineering] Cc: Zhijiang ; user@flink.apache.org Subject: Re: CoGroup SortMerger performance degradation from 1.6.4 - 1.9.1? Hi, I would suspect this: https://issues.apache.org

Re: CoGroup SortMerger performance degradation from 1.6.4 - 1.9.1?

2019-11-20 Thread Zhijiang
stack, especially it really spans several releases. Best, Zhijiang -- From:Hailu, Andreas Send Time:2019 Nov. 21 (Thu.) 01:03 To:user@flink.apache.org Subject:RE: CoGroup SortMerger performance degradation from 1.6.4 - 1.9.1

Re: How can I get the backpressure signals inside my function or operator?

2019-11-06 Thread Zhijiang
monitoring/rest_api.html Best, Zhijiang -- From:Felipe Gutierrez Send Time:2019 Nov. 7 (Thu.) 00:06 To:Chesnay Schepler Cc:Zhijiang ; user Subject:Re: How can I get the backpressure signals inside my function or operator? If I can t

Re: What metrics can I see the root cause of "Buffer pool is destroyed" message?

2019-11-06 Thread Zhijiang
lure which triggers the following cancel operations. In addition, which flink version are you using? Best, Zhijiang -- From:Felipe Gutierrez Send Time:2019 Nov. 6 (Wed.) 19:12 To:user Subject:What metrics can I see the root cause

Re: How can I get the backpressure signals inside my function or operator?

2019-11-05 Thread Zhijiang
x27;s metric of `Shuffle.Netty.Input.Buffers.inputQueueLength` on preAggregate side, you might rely on some external metric reporter to query it if possible. Best, Zhijiang -- From:Felipe Gutierrez Send Time:2019 Nov. 5 (Tue.

Re: [ANNOUNCE] Zili Chen becomes a Flink committer

2019-09-11 Thread zhijiang
Congratulations Zili! -- From:Becket Qin Send Time:2019年9月12日(星期四) 03:43 To:Paul Lam Cc:Rong Rong ; dev ; user Subject:Re: [ANNOUNCE] Zili Chen becomes a Flink committer Congrats, Zili! On Thu, Sep 12, 2019 at 9:39 AM Paul Lam wr

Re: [ANNOUNCE] Andrey Zagrebin becomes a Flink committer

2019-08-14 Thread zhijiang
Congratulations Andrey, great work and well deserved! Best, Zhijiang -- From:Till Rohrmann Send Time:2019年8月14日(星期三) 15:26 To:dev ; user Subject:[ANNOUNCE] Andrey Zagrebin becomes a Flink committer Hi everyone, I'm very hap

Re: Will broadcast stream affect performance because of the absence of operator chaining?

2019-08-06 Thread zhijiang
your acception or not. If the performance is not reaching your requirements, we could further consider other improvements. Best, Zhijiang -- From:Piotr Nowojski Send Time:2019年8月6日(星期二) 14:55 To:黄兆鹏 Cc:user Subject:Re: Will

Re: [ANNOUNCE] Rong Rong becomes a Flink committer

2019-07-11 Thread zhijiang
Congratulations Rong! Best, Zhijiang -- From:Kurt Young Send Time:2019年7月11日(星期四) 22:54 To:Kostas Kloudas Cc:Jark Wu ; Fabian Hueske ; dev ; user Subject:Re: [ANNOUNCE] Rong Rong becomes a Flink committer Congratulations Rong

Re: Maybe a flink bug. Job keeps in FAILING state

2019-06-24 Thread zhijiang
final decision. Best, Zhijiang -- From:Joshua Fan Send Time:2019年6月25日(星期二) 11:10 To:zhijiang Cc:Chesnay Schepler ; user ; Till Rohrmann Subject:Re: Maybe a flink bug. Job keeps in FAILING state Hi Zhijiang Thank you for your

Re: Maybe a flink bug. Job keeps in FAILING state

2019-06-21 Thread zhijiang
exit to solve the potential issue. Best, Zhijiang -- From:Chesnay Schepler Send Time:2019年6月21日(星期五) 16:34 To:zhijiang ; Joshua Fan Cc:user ; Till Rohrmann Subject:Re: Maybe a flink bug. Job keeps in FAILING state The logs are at

Re: Maybe a flink bug. Job keeps in FAILING state

2019-06-21 Thread zhijiang
to check the task final state. Best, Zhijiang -- From:Joshua Fan Send Time:2019年6月20日(星期四) 11:55 To:zhijiang Cc:user ; Till Rohrmann ; Chesnay Schepler Subject:Re: Maybe a flink bug. Job keeps in FAILING state zhijiang I did not

Re: Maybe a flink bug. Job keeps in FAILING state

2019-06-19 Thread zhijiang
As long as one task is in canceling state, then the job status might be still in canceling state. @Joshua Do you confirm all of the tasks in topology were already in terminal state such as failed or canceled? Best, Zhijiang

Re: A little doubt about the blog A Deep To Flink's NetworkStack

2019-06-16 Thread zhijiang
/browse/FLINK-10462 Best, Zhijiang -- From:aitozi Send Time:2019年6月16日(星期日) 22:19 To:user Subject:A little doubt about the blog A Deep To Flink's NetworkStack Hi, community I read this blog A Deep To Flink's NetworkSt

Re: java.io.FileNotFoundException in implementing exactly once

2019-06-11 Thread zhijiang
you could double check this dir for the issue. In addition I suggestt you upgrading the flink version because flink-1.3.3 is too old. After upgrading to flink-1.5 above, you do not need to consider this issue, because the exactly-once mode would not spill data to disk any more. Best, Zhijiang

Re: Apache Flink - Disabling system metrics and collecting only specific metrics

2019-06-11 Thread zhijiang
could implement a custom MetricReporter, and then only consentrate on your required application metrics in the method of `MetricReporter#notifyOfAddedMetric` to show them in backend. Best, Zhijiang -- From:M Singh Send Time:2019年6月

Re: [DISCUSS] Deprecate previous Python APIs

2019-06-11 Thread zhijiang
It is reasonable as stephan explained. +1 from my side! -- From:Jeff Zhang Send Time:2019年6月11日(星期二) 22:11 To:Stephan Ewen Cc:user ; dev Subject:Re: [DISCUSS] Deprecate previous Python APIs +1 Stephan Ewen 于2019年6月11日周二 下午9:30写道

Re: 回复:Memory Allocate/Deallocate related Thread Deadlock encountered when running a large job > 10k tasks

2019-06-04 Thread zhijiang
ishna" Cc:Nico Kruber ; user@flink.apache.org ; "Chan, Regina" Subject:RE: 回复:Memory Allocate/Deallocate related Thread Deadlock encountered when running a large job > 10k tasks Thanks Zhijiang. Can you point us to the JIRA for your fix? Regards, -Rahul From: zhijiang Sent: Tue

Re: 回复:Memory Allocate/Deallocate related Thread Deadlock encountered when running a large job > 10k tasks

2019-06-04 Thread zhijiang
for review atm. You could pick the code in PR to verfiy the results if you like. And the next release-1.8.1 might cover this fix as well. Best, Zhijiang -- From:Erai, Rahul Send Time:2019年6月4日(星期二) 15:50 To:zhijiang ; Aljoscha

Re: 回复:Memory Allocate/Deallocate related Thread Deadlock encountered when running a large job > 10k tasks

2019-05-21 Thread zhijiang
, Zhijiang -- From:Narayanaswamy, Krishna Send Time:2019年5月22日(星期三) 00:49 To:zhijiang ; Aljoscha Krettek ; Piotr Nowojski Cc:Nico Kruber ; user@flink.apache.org ; "Chan, Regina" ; "Erai, Rahul" Subject:RE

Re: 回复:Memory Allocate/Deallocate related Thread Deadlock encountered when running a large job > 10k tasks

2019-05-21 Thread zhijiang
Hi Krishna, Could you show me or attach the jstack for the single slot case? Or is it the same jstack as before? Best, Zhijiang -- From:Narayanaswamy, Krishna Send Time:2019年5月21日(星期二) 19:50 To:zhijiang ; Aljoscha Krettek

Re: 回复:Memory Allocate/Deallocate related Thread Deadlock encountered when running a large job > 10k tasks

2019-05-17 Thread zhijiang
bugs in previous flink versions. [1] https://issues.apache.org/jira/browse/FLINK-12544 Best, Zhijiang -- From:Narayanaswamy, Krishna Send Time:2019年5月17日(星期五) 19:00 To:zhijiang ; Aljoscha Krettek ; Piotr Nowojski Cc:Nico Kruber

Re: 回复:Memory Allocate/Deallocate related Thread Deadlock encountered when running a large job > 10k tasks

2019-05-17 Thread zhijiang
one task triggers another task to release memory in the same TM. Or you could increase the network buffer setting to work aournd, but not sure this way could work for your case because it is up to the total data size the source produced. Best, Zhijiang

Re: Netty channel closed at AKKA gated status

2019-04-21 Thread zhijiang
Hi Wenrui, I think you could trace the log of node manager which contains the lifecycle of this task executor. Maybe this task executor is killed by node manager because of memory overuse. Best, Zhijiang -- From:Wenrui Meng Send

Re: Netty channel closed at AKKA gated status

2019-04-15 Thread zhijiang
Hi Wenrui, You might further check whether there exists network connection issue between job master and target task executor if you confirm the target task executor is still alive. Best, Zhijiang -- From:Biao Liu Send Time:2019年4

Re: Can back pressure data be gathered by Flink metric system?

2019-04-15 Thread zhijiang
length/usage on consumer side. Although it is not very accurate sometimes, it could provide some hints of backpressure, because the outqueue and inqueue should be filled with buffers between producer and consumer when backpressure occurs. Best, Zhijiang

Re: Can back pressure data be gathered by Flink metric system?

2019-04-14 Thread zhijiang
Hi Henry, The backpressure tracking is not realized in metric framework, you could check the details via [1]. I am not sure why your requirements is showing backpressure in metrics. [1] https://ci.apache.org/projects/flink/flink-docs-release-1.8/monitoring/back_pressure.html Best, Zhijiang

Re: Retain metrics counters across task restarts

2019-04-14 Thread zhijiang
restarted. Best, Zhijiang -- From:Peter Zende Send Time:2019年4月14日(星期日) 00:25 To:user Subject:Retain metrics counters across task restarts Hi all We're exposing Prometheus metrics from our Flink (v1.7.1) pipeline to Prome

Re: Netty channel closed at AKKA gated status

2019-04-14 Thread zhijiang
Hi Wenrui, I think the akka gated issue and inactive netty channel are both caused by some task manager exits/killed. You should double check the status and reason of this task manager `'athena592-phx2/10.80.118.166:44177'`. Best

Re: Question regarding "Insufficient number of network buffers"

2019-04-11 Thread zhijiang
ted. Best, Zhijiang -- From:Xiangfeng Zhu Send Time:2019年4月12日(星期五) 08:03 To:user Subject:Question regarding "Insufficient number of network buffers" Hello, My name is Allen, and I'm currently researching different distr

Re: [ANNOUNCE] Apache Flink 1.8.0 released

2019-04-10 Thread zhijiang
Cool! Finally see the FLINK 1.8.0 release. Thanks Aljoscha for this excellent work and efforts for other contributors. We would continue working hard for FLINK 1.9.0 Best, Zhijiang -- From:vino yang Send Time:2019年4月10日(星期三) 17

Re: Flink credit based flow control

2019-03-11 Thread zhijiang
what is the flush timeout you config. Also you can trace the current metrics of outqueue.usages|length and inqueue.usags|length to find something. Best, Zhijiang -- From:Brian Ramprasad Send Time:2019年3月12日(星期二) 03:47 To:user

Re: Checkpoints and catch-up burst (heavy back pressure)

2019-03-03 Thread zhijiang
queued in front of barriers. This is the right way to try and wish your solution with 2 parameters work. Best, Zhijiang -- From:LINZ, Arnaud Send Time:2019年3月2日(星期六) 16:45 To:zhijiang ; user Subject:RE: Checkpoints and catch-up

Re: Checkpoints and catch-up burst (heavy back pressure)

2019-02-28 Thread zhijiang
one single downstream task (`a` is the parallelism of source vertex), because it is all-to-all connection. The barrier alignment takes more time in rebalance mode than forward mode. Best, Zhijiang -- From:LINZ, Arnaud Send Time

Re: Flink performance drops when async checkpoint is slow

2019-02-28 Thread zhijiang
which operation delays the task to cause the backpressure, and this operation might be involved with HDFS. :) Best, Zhijiang -- From:Paul Lam Send Time:2019年2月28日(星期四) 19:17 To:zhijiang Cc:user Subject:Re: Flink performance drops

Re: Checkpoints and catch-up burst (heavy back pressure)

2019-02-28 Thread zhijiang
is suitable for your scenario and may have a try. Best, Zhijiang -- From:LINZ, Arnaud Send Time:2019年2月28日(星期四) 17:28 To:user Subject:Checkpoints and catch-up burst (heavy back pressure) Hello, I have a simple streaming app that get da

Re: Flink performance drops when async checkpoint is slow

2019-02-28 Thread zhijiang
task TPS are not decreased for a period as before, I think we could confirm the above analysis. :) Best, Zhijiang -- From:Paul Lam Send Time:2019年2月28日(星期四) 15:17 To:user Subject:Flink performance drops when async checkpoint is

Re: Confusion in Heartbeat configurations

2019-02-18 Thread zhijiang
implementation prefix with `akka` and the other is flink internal implementation. Best, Zhijiang -- From:sohimankotia Send Time:2019年2月18日(星期一) 14:40 To:user Subject:Confusion in Heartbeat configurations Hi, In https://ci.apache.org

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

2019-02-14 Thread zhijiang
the interested one and handle the progress of it. Best, Zhijiang -- From:Jeff Zhang Send Time:2019年2月14日(星期四) 18:03 To:Stephan Ewen Cc:dev ; user ; jincheng sun ; Shuyi Chen ; Rong Rong Subject:Re: [DISCUSS] Adding a mid-term

Re: [ANNOUNCE] New Flink PMC member Thomas Weise

2019-02-12 Thread zhijiang
Congrats Thomas! Best, Zhijiang -- From:Kostas Kloudas Send Time:2019年2月12日(星期二) 22:46 To:Jark Wu Cc:Hequn Cheng ; Stefan Richter ; user Subject:Re: [ANNOUNCE] New Flink PMC member Thomas Weise Congratulations Thomas! Best

Re: ConnectTimeoutException when createPartitionRequestClient

2019-01-09 Thread zhijiang
number of netty thread and timeout should make sense for normal cases. Best, Zhijiang -- From:Wenrui Meng Send Time:2019年1月9日(星期三) 18:18 To:Till Rohrmann Cc:user ; Konstantin Subject:Re: ConnectTimeoutException when

Re: Buffer stats when Back Pressure is high

2019-01-07 Thread zhijiang
B is empty. Best, Zhijiang -- From:Gagan Agrawal Send Time:2019年1月7日(星期一) 12:06 To:user Subject:Buffer stats when Back Pressure is high Hi, I want to understand does any of buffer stats help in debugging / validating that

回复:buffer pool is destroyed

2018-12-21 Thread zhijiang
Hi Shuang, Normally this exception you mentioned is not the root cause of failover, and it is mainly caused by cancel process to make task exit. You can further check whether there are other failures in job master log to find the root cause. Best, Zhijiang

回复:回复:Flink job failing due to "Container is running beyond physical memory limits" error.

2018-11-26 Thread zhijiang
It may work aournd by increasing the task manager memory size. The recover failure is up to serveral issues, whether it had successful checkpoint before, the states are available and what is the failover strategy? Best, Zhijiang

回复:Flink job failing due to "Container is running beyond physical memory limits" error.

2018-11-25 Thread zhijiang
I think it is probably related with rockdb memory usage if you have not found OutOfMemory issue before. There already existed a jira ticket [1] for fixing this issue, and you can watch it for updates. :) [1] https://issues.apache.org/jira/browse/FLINK-10884 Best, Zhijiang

回复:OutOfMemoryError while doing join operation in flink

2018-11-22 Thread zhijiang
So you can decrease the "taskmanager.memory.fraction" in low fraction or increase the total task manager to cover this overhead memories, or set one slot for each task manager. Best, Zhijiang -- 发件人:Akshay Mendole 发送时间

回复:OutOfMemoryError while doing join operation in flink

2018-11-22 Thread zhijiang
, Zhijiang -- 发件人:Akshay Mendole 发送时间:2018年11月22日(星期四) 13:43 收件人:user 主 题:OutOfMemoryError while doing join operation in flink Hi, We are converting one of our pig pipelines to flink using apache beam. The pig pipeline reads two

回复:checkpoint/taskmanager is stuck, deadlock on LocalBufferPool

2018-10-23 Thread zhijiang
output buffers are not consumed by downstream tasks. I think you can check the downstream task which inqueue usage should reach 100%, then jstack the corresponding downstream tasks that may stuck in some operations to cause back pressure. Best, Zhijiang

回复:Need help to understand memory consumption

2018-10-16 Thread Zhijiang(wangzhijiang999)
The operators for stream jobs will not use memory management which is only for batch jobs as you said. I guess the initial feedback is for batch jobs from the description? -- 发件人:Paul Lam 发送时间:2018年10月17日(星期三) 14:35 收件人:Zhijiang

回复:Need help to understand memory consumption

2018-10-16 Thread Zhijiang(wangzhijiang999)
not recycled by gc, so you will see no changes in memory consumption. After you restart the TaskManager, the initial memory consumption is low because of lazy allocating via taskmanager.memory.preallocate=false. Best, Zhijiang -

回复:What are channels mapped to?

2018-10-11 Thread Zhijiang(wangzhijiang999)
will be consumed by the first reduce task. Best, Zhijiang -- 发件人:Chris Miller 发送时间:2018年10月11日(星期四) 16:54 收件人:user 主 题:What are channels mapped to? Hi, in the OutputEmitter, the output channel can be selected in different manner. eg

回复:Small checkpoint data takes too much time

2018-10-09 Thread Zhijiang(wangzhijiang999)
backpressure. You can check the metric of "checkpointAlignmentTime" for confirmation. Best, Zhijiang -- 发件人:徐涛 发送时间:2018年10月10日(星期三) 13:13 收件人:user 主 题:Small checkpoint data takes too much time Hi I recently encounter a

回复:Memory Allocate/Deallocate related Thread Deadlock encountered when running a large job > 10k tasks

2018-10-08 Thread Zhijiang(wangzhijiang999)
. Or I guess we can set the ExecutionMode#PIPELINED_FORCED to not generate blocking result partition to avoid this issue temporarily. Best, Zhijiang -- 发件人:Piotr Nowojski 发送时间:2018年10月4日(星期四) 21:54 收件人:Aljoscha Krettek 抄 送

回复:[DISCUSS] Dropping flink-storm?

2018-09-28 Thread Zhijiang(wangzhijiang999)
Very agree with to drop it. +1 -- 发件人:Jeff Carter 发送时间:2018年9月29日(星期六) 10:18 收件人:dev 抄 送:chesnay ; Till Rohrmann ; user 主 题:Re: [DISCUSS] Dropping flink-storm? +1 to drop it. On Fri, Sep 28, 2018, 7:25 PM Hequn Cheng wrote: > H

回复:回复:InpoolUsage & InpoolBuffers inconsistence

2018-09-18 Thread Zhijiang(wangzhijiang999)
3. I suggest you upgrading the version to 1.5 or above. The checkpoint process may be faster than the current version. Best, Zhijiang -- 发件人:aitozi 发送时间:2018年9月18日(星期二) 14:53 收件人:user 主 题:Re: 回复:InpoolUsage & InpoolBuffers inconsist

回复:InpoolUsage & InpoolBuffers inconsistence

2018-09-17 Thread Zhijiang(wangzhijiang999)
%, there may be a lot of buffers queued in front of barrier, so the checkpoint may expire as you said. Best, Zhijiang -- 发件人:aitozi 发送时间:2018年9月18日(星期二) 12:59 收件人:user 主 题:Re: InpoolUsage & InpoolBuffers inconsistence And my d

回复:Flink application down due to RpcTimeout exception

2018-09-12 Thread Zhijiang(wangzhijiang999)
. 2. You can increase the default value of rpc timeout parameter(akka.ask.timeout) to work around temporarily. Best, Zhijiang -- 发件人:徐涛 发送时间:2018年9月13日(星期四) 14:10 收件人:user 主 题:Flink application down due to RpcTimeout exception Hi

回复:Flink 1.6 Job fails with IllegalStateException: Buffer pool is destroyed.

2018-09-07 Thread Zhijiang(wangzhijiang999)
, Zhijiang -- 发件人:杨力 发送时间:2018年9月7日(星期五) 13:09 收件人:user 主 题:Flink 1.6 Job fails with IllegalStateException: Buffer pool is destroyed. Hi all, I am encountering a weird problem when running flink 1.6 in yarn per-job clusters. The job

回复:Backpressure? for Batches

2018-08-29 Thread Zhijiang(wangzhijiang999)
You can check the log to show the related stack in OOM, maybe we can confirm some reasons. Or you can dump the heap to analyze the memory usages after OOM. Best, Zhijiang -- 发件人:Darshan Singh 发送时间:2018年8月29日(星期三) 19:22 收件人

回复:Backpressure? for Batches

2018-08-29 Thread Zhijiang(wangzhijiang999)
(groupby in your case) or decrease the parallelism of fast node(source in your case). Best, Zhijiang -- 发件人:Darshan Singh 发送时间:2018年8月29日(星期三) 18:16 收件人:chesnay 抄 送:wangzhijiang999 ; user 主 题:Re: Backpressure? for Batches Thanks, Now

回复:Backpressure? for Batches

2018-08-29 Thread Zhijiang(wangzhijiang999)
I remember, that means the downstream will be scheduled after upstream finishes, so the slower downstream will not block upstream running, then the backpressure may not exist in this case. Best, Zhijiang -- 发件人:Darshan Singh 发送时间

回复:Kryo Serialization Issue

2018-08-28 Thread Zhijiang(wangzhijiang999)
buffers in record serializers. If the record size is large and the downstream parallelism is large, it may cause OOM issue in serialization. Could you show the stack of OOM part? If it is this case, the following [1] can solve it and it is working in progress. Zhijiang [1] https

  1   2   >