[ https://issues.apache.org/jira/browse/KAFKA-6626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Maciej Bryński updated KAFKA-6626: ---------------------------------- Description: Kafka Connect is using IdentityHashMap for storing records. [https://github.com/apache/kafka/blob/trunk/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L239] Unfortunately this solution is very slow (2-4 times slower than normal HashMap / HashSet). Benchmark result (code in attachment). {code:java} Identity 4220 Set 2115 Map 1941 Fast Set 2121 {code} Things are even worse when using default GC configuration (-server -XX:+UseG1GC -XX:MaxGCPauseMillis=100 -XX:InitiatingHeapOccupancyPercent=35 -Djava.awt.headless=true) {code:java} Identity 7885 Set 2364 Map 1548 Fast Set 1520 {code} Java version {code:java} java version "1.8.0_152" Java(TM) SE Runtime Environment (build 1.8.0_152-b16) Java HotSpot(TM) 64-Bit Server VM (build 25.152-b16, mixed mode) {code} This problem is greatly slowing Kafka Connect. !image-2018-03-08-08-35-19-247.png! was: Kafka Connect is using IdentityHashMap for storing records. [https://github.com/apache/kafka/blob/trunk/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L239] Unfortunately this solution is very slow (2 times slower than normal HashMap / HashSet). Benchmark result (code in attachment). {code:java} Identity 4220 Set 2115 Map 1941 Fast Set 2121 {code} This problem is greatly slowing Kafka Connect. !image-2018-03-08-08-35-19-247.png! > Performance bottleneck in Kafka Connect sendRecords > --------------------------------------------------- > > Key: KAFKA-6626 > URL: https://issues.apache.org/jira/browse/KAFKA-6626 > Project: Kafka > Issue Type: Bug > Affects Versions: 1.0.0 > Reporter: Maciej Bryński > Priority: Major > Attachments: MapPerf.java, image-2018-03-08-08-35-19-247.png > > > Kafka Connect is using IdentityHashMap for storing records. > [https://github.com/apache/kafka/blob/trunk/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L239] > Unfortunately this solution is very slow (2-4 times slower than normal > HashMap / HashSet). > Benchmark result (code in attachment). > {code:java} > Identity 4220 > Set 2115 > Map 1941 > Fast Set 2121 > {code} > Things are even worse when using default GC configuration > (-server -XX:+UseG1GC -XX:MaxGCPauseMillis=100 > -XX:InitiatingHeapOccupancyPercent=35 -Djava.awt.headless=true) > {code:java} > Identity 7885 > Set 2364 > Map 1548 > Fast Set 1520 > {code} > Java version > {code:java} > java version "1.8.0_152" > Java(TM) SE Runtime Environment (build 1.8.0_152-b16) > Java HotSpot(TM) 64-Bit Server VM (build 25.152-b16, mixed mode) > {code} > This problem is greatly slowing Kafka Connect. > !image-2018-03-08-08-35-19-247.png! > -- This message was sent by Atlassian JIRA (v7.6.3#76005)