Hi Rabin, I like this idea but it presents challenges for delivery semantics, especially (though not exclusively) with sink connectors.
If a sink connector uses the SinkTask::preCommit method [1] to explicitly notify the Connect runtime about which offsets are safe to commit, it may mistakenly report that an offset is safe to commit after successfully writing a single record with that offset to the sink system, even if there a 1->many transformation has created multiple records with that offset. If a sink connector doesn't use that method, then we'd also have to do some extra bookkeeping in the Connect runtime to track progress within a batch of records that were all derived from a single original record in Kafka. We currently use consumer group offsets to track sink connector progress (unless the connector chooses to use an alternative mechanism), which doesn't support any finer granularity than one record at a time, so there would have to be some non-trivial design work to figure things out on that front. There's also a question of how we'd want to apply this across entire chains of transformations. You mention the possibility of many->many; does this mean that there would be an opt-in interface for accepting an entire batch of records that was produced by a prior transformation? Overall my sense of this feature so far has been that there are a lot of edge cases that can bite our users if we're not careful about addressing them with clever design work, which is at least part of the reason that we haven't implemented it yet. I'd be interested in your thoughts about how we can make this feature safe and easy to use without adding footguns for existing connectors and deployments. [1] - https://kafka.apache.org/34/javadoc/org/apache/kafka/connect/sink/SinkTask.html#preCommit(java.util.Map) Cheers, Chris On Tue, May 16, 2023 at 6:26 AM Rabin Banerjee <rabin_baner...@apple.com.invalid> wrote: > Hi Team, > > We have a few use cases where nested record with array needs to be > flattened / exploded to multiple rows and write it to destination like > Cassandra. > > Now current Transformation interface in Kafka connect takes one record and > returns one / zero record. > Also current flatten ignores the array > https://issues.apache.org/jira/browse/KAFKA-12305. > > We would like to know about your thought on enhancing the transformation > interface to produce more than one record , 1 -> Many or Many -> Many. > > We understand an alternative is to use an upstream KStream pipeline but > that has multiple challenges like adding extra hop, more pipelines to > maintain etc. > > Enhancing the Transformation interface would allow us to have a Generic > SMT to handle this like Explode, similar to > https://github.com/apache/spark/blob/007c4593058537251d83a4cb44efe31e394aee22/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L4542 > Thanks > Rabin