[jira] [Updated] (KAFKA-15912) Parallelize conversion and transformation steps in Connect

Mickael Maison (Jira) Tue, 28 Nov 2023 03:28:11 -0800


     [ 
https://issues.apache.org/jira/browse/KAFKA-15912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Mickael Maison updated KAFKA-15912:
-----------------------------------
    Description: 
In busy Connect pipelines, the conversion and transformation steps can 
sometimes have a very significant impact on performance. This is especially 
true with large records with complex schemas, for example with CDC connectors 
like Debezium.

Today in order to always preserve ordering, converters and transformations are 
called on one record at a time in a single thread in the Connect worker. As 
Connect usually handles records in batches (up to max.poll.records in sink 
pipelines, for source pipelines while it really depends on the connector, most 
connectors I've seen still tend to return multiple records each loop), it could 
be highly beneficial to attempt running the converters and transformation chain 
in parallel by a pool a processing threads.

It should be possible to do some of these steps in parallel and still keep 
exact ordering. I'm even considering whether an option to lose ordering but 
allow even faster processing would make sense.

  was:
In busy Connect pipelines, the conversion and transformation steps can 
sometimes have a very significant impact on performance. This is especially 
true with large records with complex schemas, for example with CDC connectors.

Today in order to always preserve ordering, converters and transformations are 
called on one record at a time in a single thread in the Connect worker. As 
Connect usually handles records in batches (up to max.poll.records in sink 
pipelines, for source pipelines it depends on the connector), it could be 
highly beneficial to attempt running the converters and transformation chain in 
parallel by a pool a processing threads.

It should be possible to do some of these steps in parallel and still keep 
exact ordering. I'm even considering whether an option to lose ordering but 
allow even faster processing would make sense.


> Parallelize conversion and transformation steps in Connect
> ----------------------------------------------------------
>
>                 Key: KAFKA-15912
>                 URL: https://issues.apache.org/jira/browse/KAFKA-15912
>             Project: Kafka
>          Issue Type: Improvement
>          Components: connect
>            Reporter: Mickael Maison
>            Priority: Major
>
> In busy Connect pipelines, the conversion and transformation steps can 
> sometimes have a very significant impact on performance. This is especially 
> true with large records with complex schemas, for example with CDC connectors 
> like Debezium.
> Today in order to always preserve ordering, converters and transformations 
> are called on one record at a time in a single thread in the Connect worker. 
> As Connect usually handles records in batches (up to max.poll.records in sink 
> pipelines, for source pipelines while it really depends on the connector, 
> most connectors I've seen still tend to return multiple records each loop), 
> it could be highly beneficial to attempt running the converters and 
> transformation chain in parallel by a pool a processing threads.
> It should be possible to do some of these steps in parallel and still keep 
> exact ordering. I'm even considering whether an option to lose ordering but 
> allow even faster processing would make sense.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (KAFKA-15912) Parallelize conversion and transformation steps in Connect

Reply via email to