from:"yingjie"

Re: [DISCUSS] Change some default config values of blocking shuffle

2022-01-06 Thread Yingjie Cao

Hi all, Thanks very much for all of the feedbacks. It seems that we have reached a consensus. I will start a vote soon. Best, Yingjie Yun Gao 于2022年1月5日周三 16:08写道： > Very thanks @Yingjie for completing the experiments! > > Also +1 for changing the default config values. From the ex

Re: [DISCUSS] Change some default config values of blocking shuffle

2022-01-04 Thread Yingjie Cao

my limited knowledge, I guess that increasing the total TaskManager size and network memory size is important for performance, because more memory (managed and network) can make operators and shuffle faster. Best, Yingjie Yingjie Cao 于2021年12月15日周三 12:19写道： > Hi Till, > > Thanks fo

Re: [DISCUSS] Change some default config values of blocking shuffle

2021-12-14 Thread Yingjie Cao

Hi Till, Thanks for the suggestion. I think it makes a lot of sense to also extend the documentation for the sort shuffle to include a tuning guide. Best, Yingjie Till Rohrmann 于2021年12月14日周二 18:57写道： > As part of this FLIP, does it make sense to also extend the documentation > for th

Re: [DISCUSS] Change some default config values of blocking shuffle

2021-12-13 Thread Yingjie Cao

st if a rough table could be provided. I think this is a good suggestion, we can provide those suggestions in the document. Best, Yingjie Jingsong Li 于2021年12月14日周二 14:39写道： > Hi Yingjie, > > +1 for this FLIP. I'm pretty sure this will greatly improve the ease > of batch jobs. >

Re: [DISCUSS] Change some default config values of blocking shuffle

2021-12-13 Thread Yingjie Cao

mory.framework.off-heap.batch-shuffle.size, you can increase these values for large-scale batch jobs. BTW, I am still running TPCDS tests these days and I can share these results soon. Best, Yingjie 刘建刚于2021年12月10日周五 18:30写道： > Glad to see the suggestion. In our test, we found that small jobs with t

Re: [DISCUSS] Change some default config values of blocking shuffle

2021-12-13 Thread Yingjie Cao

the second issue, I will try to find the cause and solve it in 1.15. I am open for your suggestion, but I still need some more tests and analysis to guarantee that it works well. Best, Yingjie Yun Gao 于2021年12月10日周五 17:19写道： > Hi Yingjie, > > Very thanks for drafting the FLIP and initia

Re: [DISCUSS] Change some default config values of blocking shuffle

2021-12-10 Thread Yingjie Cao

Hi dev & users: I have created a FLIP [1] for it, feedbacks are highly appreciated. Best, Yingjie [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-199%3A+Change+some+default+config+values+of+blocking+shuffle+for+better+usability Yingjie Cao 于2021年12月3日周五 17:02写道： > Hi dev

Re: [DISCUSS] Change some default config values of blocking shuffle

2021-12-06 Thread Yingjie Cao

. It can be a very short one, though. Yes, you are right. I will prepare a simple FLIP soon. Best, Yingjie Till Rohrmann 于2021年12月3日周五 18:39写道： > Thanks for starting this discussion Yingjie, > > How will our tests be affected by these changes? Will Flink require more > res

[DISCUSS] Change some default config values of blocking shuffle

2021-12-03 Thread Yingjie Cao

nd both the performance and stability improved a lot. These changes can help to improve the out-of-box experience of blocking shuffle. What do you think about these changes? Is there any concern? If there are no objections, I will make these changes soon. Best, Yingjie

Re: [ANNOUNCE] Open source of remote shuffle project for Flink batch data processing

2021-12-01 Thread Yingjie Cao

Hi Jiangang, Great to hear that, welcome to work together to make the project better. Best, Yingjie 刘建刚于2021年12月1日周三下午3:27写道： > Good work for flink's batch processing! > Remote shuffle service can resolve the container lost problem and reduce > the running time for batch jobs

[ANNOUNCE] Open source of remote shuffle project for Flink batch data processing

2021-11-30 Thread Yingjie Cao

project, please refer to [1]. Before going open source, the project has been used widely in production and it behaves well on both stability and performance. We hope you enjoy it. Collaborations and feedbacks are highly appreciated. Best, Yingjie on behalf of all contributors [1] https://gi

[DISCUSS] FLIP-184: Refine ShuffleMaster lifecycle management for pluggable shuffle service framework

2021-07-13 Thread Yingjie Cao

. [2] https://cwiki.apache.org/confluence/display/FLINK/FLIP-184%3A+Refine+ShuffleMaster+lifecycle+management+for+pluggable+shuffle+service+framework . Best, Yingjie

Re: flink job exception analysis (netty related, readAddress failed. connection timed out)

2021-06-16 Thread Yingjie Cao

Maybe you can try to increase taskmanager.network.retries, taskmanager.network.netty.server.backlog and taskmanager.network.netty.sendReceiveBufferSize. These options are useful for our jobs. yidan zhao 于2021年6月16日周三下午7:10写道： > Hi, yingjie. > If the network is not stable, which

Re: flink job exception analysis (netty related, readAddress failed. connection timed out)

2021-06-16 Thread Yingjie Cao

/ops/batch/blocking_shuffle/ [2] https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/config/ [3] https://issues.apache.org/jira/browse/FLINK-22643 Best, Yingjie yidan zhao 于2021年6月16日周三下午3:36写道： > Attachment is the exception stack from flink's web-ui. Does anyone > have

Re: [DISCUSS] FLIP-148: Introduce Sort-Merge Based Blocking Shuffle to Flink

2021-06-08 Thread Yingjie Cao

Hi Till, Thanks for the suggestion. The blog post is already on the way. Best, Yingjie Till Rohrmann 于2021年6月8日周二下午5:30写道： > Thanks for the update Yingjie. Would it make sense to write a short blog > post about this feature including some performance improvement numbers? I > think t

Re: [DISCUSS] FLIP-148: Introduce Sort-Merge Based Blocking Shuffle to Flink

2021-06-06 Thread Yingjie Cao

nt[2] which contains more implementation details. FYI. [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-148%3A+Introduce+Sort-Based+Blocking+Shuffle+to+Flink [2] https://docs.google.com/document/d/1j12TkSqgf6dg3J48udA2MFrDOQccW24tzjn5pJlTQaQ/edit?usp=sharing Best, Yingjie Yingjie Ca

Re: With the checkpoint interval of the same size, the Flink 1.12 version of the job checkpoint time-consuming increase and production failure, the Flink1.9 job is running normally

2021-03-30 Thread Yingjie Cao

Hi Haihang, After scanning the user mailing list, I found some users have reported checkpoint timeout when using unaligned checkpoint, can you share which checkpoint mode do you use? (The information can be found in log or the checkpoint -> configuration tab in webui) Best, Yingjie Yingjie

Re: With the checkpoint interval of the same size, the Flink 1.12 version of the job checkpoint time-consuming increase and production failure, the Flink1.9 job is running normally

2021-03-30 Thread Yingjie Cao

share some more information, for example, the stack of the task which can not finish the checkpoint? Best, Yingjie Haihang Jing 于2021年3月25日周四上午10:58写道： > Hi，Congxian ,thanks for your replay. > job run on Flink1.9 （checkpoint interval 3min） > < > http://apache-flink-user-mailing

Re: What happens to the channels when there is backpressure?

2019-11-27 Thread yingjie cao

Hi Felipe, That depends on what do you mean by 'bandwidth'. If you mean the capability of the network stack, the answer would be no. Here is a post about Flink network stack which may help: https://flink.apache.org/2019/06/05/flink-network-stack.html. Thanks, Yingjie Felipe Gutierrez

Re: CoGroup SortMerger performance degradation from 1.6.4 - 1.9.1?

2019-11-27 Thread yingjie

Piotr is right, that depend on the data size you are reading and the memory pressure. Those memory occupied by mmapped region can be recycled and used by other processes if memory pressure is high, that is, other process or service on the same node won't be affected because the OS will recycle the

Re: CoGroup SortMerger performance degradation from 1.6.4 - 1.9.1?

2019-11-26 Thread yingjie

The new BlockingSubpartition implementation in 1.9 uses mmap for data reading by default which means it steals memory from OS. The mmaped region memory is managed by JVM, so there should be no OutOfMemory problem reported by JVM and the OS memory is also not exhausted, so there should be no kernal

Re: [DISCUSS] Change some default config values of blocking shuffle

Re: [DISCUSS] Change some default config values of blocking shuffle

Re: [DISCUSS] Change some default config values of blocking shuffle

Re: [DISCUSS] Change some default config values of blocking shuffle

Re: [DISCUSS] Change some default config values of blocking shuffle

Re: [DISCUSS] Change some default config values of blocking shuffle

Re: [DISCUSS] Change some default config values of blocking shuffle

Re: [DISCUSS] Change some default config values of blocking shuffle

[DISCUSS] Change some default config values of blocking shuffle

Re: [ANNOUNCE] Open source of remote shuffle project for Flink batch data processing

[ANNOUNCE] Open source of remote shuffle project for Flink batch data processing

[DISCUSS] FLIP-184: Refine ShuffleMaster lifecycle management for pluggable shuffle service framework

Re: flink job exception analysis (netty related, readAddress failed. connection timed out)

Re: flink job exception analysis (netty related, readAddress failed. connection timed out)

Re: [DISCUSS] FLIP-148: Introduce Sort-Merge Based Blocking Shuffle to Flink

Re: [DISCUSS] FLIP-148: Introduce Sort-Merge Based Blocking Shuffle to Flink

Re: With the checkpoint interval of the same size, the Flink 1.12 version of the job checkpoint time-consuming increase and production failure, the Flink1.9 job is running normally

Re: With the checkpoint interval of the same size, the Flink 1.12 version of the job checkpoint time-consuming increase and production failure, the Flink1.9 job is running normally

Re: What happens to the channels when there is backpressure?

Re: CoGroup SortMerger performance degradation from 1.6.4 - 1.9.1?

Re: CoGroup SortMerger performance degradation from 1.6.4 - 1.9.1?

21 matches

Site Navigation

Mail list logo

Footer information