[ https://issues.apache.org/jira/browse/FLINK-16057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17115957#comment-17115957 ]
Roman Khachatryan commented on FLINK-16057: ------------------------------------------- I was able to reproduce the above results both locally and on codespeed.dak8s.net (*newer* version being faster). The benchmark: [https://github.com/dataArtisans/flink-benchmarks/pull/62] > Performance regression in ContinuousFileReaderOperator > ------------------------------------------------------ > > Key: FLINK-16057 > URL: https://issues.apache.org/jira/browse/FLINK-16057 > Project: Flink > Issue Type: Bug > Components: API / DataStream, Runtime / Task > Affects Versions: 1.11.0 > Reporter: Roman Khachatryan > Assignee: Roman Khachatryan > Priority: Blocker > Labels: pull-request-available > Fix For: 1.11.0 > > Time Spent: 20m > Remaining Estimate: 0h > > After switching CFRO to a single-threaded execution model performance > regression was expected to be about 15-20% (benchmarked in November). > But after merging to master it turned out to be about 50%. > > One reason is that the chaining strategy isn't set by default in CFRO factory. > Without that even reading and outputting all records of a split in a single > mail action doesn't reverse the regression (only about half). > However, with strategy set AND batching enabled fixes the regression > (starting from batch size 6). > Though batching can't be used in practice because it can significantly delay > checkpointing. > > Another approach would be to process one record and the repeat until > defaultMailboxActionAvailable OR haveNewMail. > This reverses regression and even improves the performance by about 50% > compared to the old version. > > The final solution could also be FLIP-27. > > Other things tried (didn't help): > * CFRO rework without subsequent commits (removing checkpoint lock) > * different batch sizes, including the whole split, without chaining > strategy fixed - partial improvement only > * disabling close > * disabling checkpointing > * disabling output (serialization) > * using LinkedList instead of PriorityQueue > -- This message was sent by Atlassian Jira (v8.3.4#803005)