[
https://issues.apache.org/jira/browse/KAFKA-15912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mickael Maison updated KAFKA-15912:
-----------------------------------
Description:
In busy Connect pipelines, the conversion and transformation steps can
sometimes have a very significant impact on performance. This is especially
true with large records with complex schemas, for example with CDC connectors
like Debezium.
Today in order to always preserve ordering, converters and transformations are
called on one record at a time in a single thread in the Connect worker. As
Connect usually handles records in batches (up to max.poll.records in sink
pipelines, for source pipelines while it really depends on the connector, most
connectors I've seen still tend to return multiple records each loop), it could
be highly beneficial to attempt running the converters and transformation chain
in parallel by a pool a processing threads.
It should be possible to do some of these steps in parallel and still keep
exact ordering. I'm even considering whether an option to lose ordering but
allow even faster processing would make sense.
was:
In busy Connect pipelines, the conversion and transformation steps can
sometimes have a very significant impact on performance. This is especially
true with large records with complex schemas, for example with CDC connectors.
Today in order to always preserve ordering, converters and transformations are
called on one record at a time in a single thread in the Connect worker. As
Connect usually handles records in batches (up to max.poll.records in sink
pipelines, for source pipelines it depends on the connector), it could be
highly beneficial to attempt running the converters and transformation chain in
parallel by a pool a processing threads.
It should be possible to do some of these steps in parallel and still keep
exact ordering. I'm even considering whether an option to lose ordering but
allow even faster processing would make sense.
> Parallelize conversion and transformation steps in Connect
> ----------------------------------------------------------
>
> Key: KAFKA-15912
> URL: https://issues.apache.org/jira/browse/KAFKA-15912
> Project: Kafka
> Issue Type: Improvement
> Components: connect
> Reporter: Mickael Maison
> Priority: Major
>
> In busy Connect pipelines, the conversion and transformation steps can
> sometimes have a very significant impact on performance. This is especially
> true with large records with complex schemas, for example with CDC connectors
> like Debezium.
> Today in order to always preserve ordering, converters and transformations
> are called on one record at a time in a single thread in the Connect worker.
> As Connect usually handles records in batches (up to max.poll.records in sink
> pipelines, for source pipelines while it really depends on the connector,
> most connectors I've seen still tend to return multiple records each loop),
> it could be highly beneficial to attempt running the converters and
> transformation chain in parallel by a pool a processing threads.
> It should be possible to do some of these steps in parallel and still keep
> exact ordering. I'm even considering whether an option to lose ordering but
> allow even faster processing would make sense.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)