[ https://issues.apache.org/jira/browse/FLINK-24459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17424951#comment-17424951 ]
Martijn Visser commented on FLINK-24459: ---------------------------------------- [~trushev] Thanks for the report and metrics. I'll bring this up with the team, much appreciated > Performance improvement of file sink on Nexmark > ----------------------------------------------- > > Key: FLINK-24459 > URL: https://issues.apache.org/jira/browse/FLINK-24459 > Project: Flink > Issue Type: Improvement > Components: Connectors / FileSystem > Reporter: Alexander Trushev > Assignee: Alexander Trushev > Priority: Minor > Labels: pull-request-available > Attachments: after.jfr.zip, after_cpu.png, after_mem.png, > before.jfr.zip, before_cpu.png, before_mem.png > > > h3. Context > {{PartitionPathUtils.escapePathName}} is a pretty simple method that takes > {{String}}, allocates {{StringBuilder}}, appends original or escaped chars, > and outputs the result {{String}}. > Filesystem sink calls the method several times for each element to determine > bucket id. Because of this, it is a hot spot on a workload that writes > intensively to filesystem, such as [nexmark > q10|https://github.com/nexmark/nexmark/blob/master/nexmark-flink/src/main/resources/queries/q10.sql]. > On my local machine escaping of chars takes 9.53% CPU and 17.8% mem > allocations of the whole TaskManager process. > h3. Proposal > {{PartitionPathUtils.escapePathName}} improvements > # Use more efficient {{Integer.toHexString}} instead of {{String.format}} > # Do not allocate new string when there is no escapable char in the original > string > # Allocate {{StringBuilder}} depending on the original string length instead > of the default value > h3. Benefit > Experiment on local machine. > 1 TaskManager with 6 slots. Job parallelism 6. Nexmark default configuration > + object reuse option. > Before: flink-1.14.0 > After: flink-1.14.0 + patch with the improvements > || Nexmark q10 || Before || After || > | CPU samples of escapePathName() (% of all) | 9.53 | 1.64 | > | Memory allocations by escapePathName() (% of all) | 17.8 | 2.98 | > | Throughput/Cores (K/s) | 107.64 | 119.42 | > Diff: CPU *-7.89*%, Memory *-14.82*%, Throughput *+10.9*% > Profiling reports are in the attachment. -- This message was sent by Atlassian Jira (v8.3.4#803005)