Mickael Maison created KAFKA-15912:
--------------------------------------
Summary: Parallelize conversion and transformation steps in Connect
Key: KAFKA-15912
URL: https://issues.apache.org/jira/browse/KAFKA-15912
Project: Kafka
Issue Type: Improvement
Components: connect
Reporter: Mickael Maison
In busy Connect pipelines, the conversion and transformation steps can
sometimes have a very significant impact on performance. This is especially
true with large records with complex schemas, for example with CDC connectors.
Today in order to always preserve ordering, converters and transformations are
called on one record at a time in a single thread in the Connect worker. As
Connect usually handles records in batches (up to max.poll.records in sink
pipelines, for source pipelines it depends on the connector), it could be
highly beneficial to attempt running the converters and transformation chain in
parallel by a pool a processing threads.
It should be possible to do some of these steps in parallel and still keep
exact ordering. I'm even considering whether an option to lose ordering but
allow even faster processing would make sense.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)