Unfortunately, we don't have a feature to publish to a specific partition. We tried to design with Kafka conventions in mind, and I don't believe we plan to add this functionality.
On Thu, Apr 13, 2023 at 3:03 PM Juan Romero <jsrf...@gmail.com> wrote: > Hi John. Thanks for your response!. > > Point 2 is clear now for me. I was reading lot of documentation about it > and i only wanted to confirm with you. > > Regarding point 1 I know the drawbacks , we have to avoid hot partitions > (For this purpouse we can define a hash function that evenly distributes > the message across the three partitions ). However in our specific case we > are using a tool that ingest the data from transactional databases (This > tool allow us to define the partitioner function) and we want to persist > the last state of an specific database row in a target OLAP database. For > example we could have the following events which indicates that an specific > row was updated: > > ("Id": "123456 ", "name": "Juan", "last_name": "Romero", "timestamp": > "13-04-2023 13:44:00.100", "Event": "Update") > ("Id": "123456 ", "name": "Juan", "last_name": "Romero2", "timestamp": > "13-04-2023 13:44:00.200", "Event": "Update") > ("Id": "123456 ", "name": "Juan", "last_name": "Romero3", "timestamp": > "13-04-2023 13:44:00.300", "Event": "Update") > > Suppose all this events are being processed by different workers (worker > is a consumer and producer at the same time in this case, which at the end > push the data in other target topic). In the source we can guarantee that > the updatings with the same id go to the same partitions (so far the > messages would be ordered ), the problem is that we will have many > consumers and producers in the middle and in this case we can read the > messages in order from the first topic (because the same key always got to > the same partition) in the chain but after it we have to process the > message and push the message in other topic that have 3 partitions (like i > said before), in this case the producer will push the row updatings between > the different partitions and we can potencially lose the original messages > order, because we are pushing the different updates referring an specific > row across multiple partions. > > Let me know If you can understand my case. If you want I can give you a > clear diagram to explain to you. > > The general idea is that we need push the last valievent in the > transactional database in the target and in the middle we have > different beam pipelines which act like consumer and producer. > > Thnaks!! > > > > > > El jue, 13 abr 2023 a las 13:23, John Casey via user (< > user@beam.apache.org>) escribió: > >> Hi Juan, >> >> Under normal usage, Kafka will maintain ordering within a partition >> without any extra work by you. >> >> For 2, you can use .commitOffsetsInFinalize to only commit back to the >> source topic once the pipeline has persisted the message, at which point it >> may not be fully processed, but it is guaranteed that it will be processed. >> >> For 1, I would generally recommend against trying to write to a specific >> partition. This undermines the load balancing that Kafka enables with >> partitioning. I'd recommend separate topics instead. We don't support >> writing to specific partitions in the current IO. >> >> John >> >> On Thu, Apr 13, 2023 at 2:15 PM Juan Romero <jsrf...@gmail.com> wrote: >> >>> Hi Apache Beam team. I have been working on a POC for the company i'm >>> working for and Im using apache beam kafka connector to read from kafka >>> topic and write into other kafka topic. The source and target topic have 3 >>> partitions and is compulsory keep ordering by certain message keys. >>> Regarding it I have two questions: >>> >>> 1. How can I write an specific message to an specific kafka partition. >>> 2. How can we commit the message to the source topic only and only when >>> the pipeline had processed the message. >>> >>> I looking forward and hope you can help me with these doubts. >>> >>