[ https://issues.apache.org/jira/browse/FLINK-16057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113890#comment-17113890 ]
Roman Khachatryan commented on FLINK-16057: ------------------------------------------- Got unexpected results on actual files: newer version is faster (double-checking). Old version ([http://codespeed.dak8s.net:8080/job/flink-benchmark-request/162/)|http://codespeed.dak8s.net:8080/job/flink-benchmark-request/162/] {code:java} "Benchmark","Mode","Threads","Samples","Score","Score Error (99.9%)","Unit","Param: folder" "org.apache.flink.benchmark.ContinuousFileReaderOperatorIoBenchmark.readFiles","thrpt",1,30,7352.674778,240.954385,"ops/ms",txt-100-1000-10 "org.apache.flink.benchmark.ContinuousFileReaderOperatorIoBenchmark.readFiles","thrpt",1,30,5783.989828,102.949992,"ops/ms",txt-1000-100-10 {code} New version ([http://codespeed.dak8s.net:8080/job/flink-benchmark-request/163/)|http://codespeed.dak8s.net:8080/job/flink-benchmark-request/163/] {code:java} "Benchmark","Mode","Threads","Samples","Score","Score Error (99.9%)","Unit","Param: folder" "org.apache.flink.benchmark.ContinuousFileReaderOperatorIoBenchmark.readFiles","thrpt",1,30,16931.351736,551.851266,"ops/ms",txt-100-1000-10 "org.apache.flink.benchmark.ContinuousFileReaderOperatorIoBenchmark.readFiles","thrpt",1,30,6156.304362,92.567005,"ops/ms",txt-1000-100-10 {code} > Performance regression in ContinuousFileReaderOperator > ------------------------------------------------------ > > Key: FLINK-16057 > URL: https://issues.apache.org/jira/browse/FLINK-16057 > Project: Flink > Issue Type: Bug > Components: API / DataStream, Runtime / Task > Affects Versions: 1.11.0 > Reporter: Roman Khachatryan > Assignee: Roman Khachatryan > Priority: Blocker > Labels: pull-request-available > Fix For: 1.11.0 > > Time Spent: 20m > Remaining Estimate: 0h > > After switching CFRO to a single-threaded execution model performance > regression was expected to be about 15-20% (benchmarked in November). > But after merging to master it turned out to be about 50%. > > One reason is that the chaining strategy isn't set by default in CFRO factory. > Without that even reading and outputting all records of a split in a single > mail action doesn't reverse the regression (only about half). > However, with strategy set AND batching enabled fixes the regression > (starting from batch size 6). > Though batching can't be used in practice because it can significantly delay > checkpointing. > > Another approach would be to process one record and the repeat until > defaultMailboxActionAvailable OR haveNewMail. > This reverses regression and even improves the performance by about 50% > compared to the old version. > > The final solution could also be FLIP-27. > > Other things tried (didn't help): > * CFRO rework without subsequent commits (removing checkpoint lock) > * different batch sizes, including the whole split, without chaining > strategy fixed - partial improvement only > * disabling close > * disabling checkpointing > * disabling output (serialization) > * using LinkedList instead of PriorityQueue > -- This message was sent by Atlassian Jira (v8.3.4#803005)