Re: Samza partition hashing relative to other clients

Stuart Perks Wed, 14 Dec 2022 23:46:16 -0800

Hi Malcolm, 

You should be able to override the following producer config for 
partitioner.class: 
https://kafka.apache.org/24/documentation.html#producerconfigs 
<https://kafka.apache.org/24/documentation.html#producerconfigs>


This can be done as follows via Samza config systems.system-name.producer.* : 
https://samza.apache.org/learn/documentation/latest/jobs/samza-configurations.html#kafka
 
<https://samza.apache.org/learn/documentation/latest/jobs/samza-configurations.html#kafka>

Caveat I haven’t tried this but should work from docs. 

But I would say ideally to be safe you should rekey/repartition on your 
consumer to protect against future producers that differ or producers 
accidentally changing the partitioner. 

Hope it helps, 

Stuart




> On 15 Dec 2022, at 04:01, Malcolm McFarland <mmcfarl...@cavulus.com> wrote:
> 
> Hey folks,
> 
> I'm working on a system where several different Kafka clients (including
> Samza) are producing into the same Kafka topic. It's necessary for each of
> these clients to calculate the same partition hash for the same key input
> to ensure consistent message ordering (there are some asynchronous actions
> that need to be ordered across systems). I've been able to get our non-JVM
> Kafka clients to calculate partition identifiers (using the murmur2 hashing
> algorithm) in the same manner as the official Java Kafka producers.
> However, it looks like Samza uses its own hashing algorithm[0]; this is
> fine for maintaining order if it's just Samza producing into a topic, but
> it's not so great if Samza is just one system of many that are working on a
> multi-stage task.
> 
> I've dug through the Samza and Kafka codebases quite a bit over the last
> few days, and I'm at a loss about how to get Samza to hash partition
> indexes in a way that's compatible with other producers. I've tried
> implementing Samza's hashing algorithm in other clients (ie with [1]), but
> cannot for the life of me get equivalent partition calculations in a
> non-JVM language.
> 
> Does anybody know a) if it's possible to define a custom key-to-partition
> hashing algorithm in Samza, or b) if there is a reliable general-purpose
> algorithm that can create the same results as Samza's algorithm?
> 
> Cheers,
> Malcolm McFarland
> Cavulus
> 
> [0]
> https://github.com/apache/samza/blob/1.7.0/samza-kafka/src/main/java/org/apache/samza/util/KafkaUtil.java#L47-L49
> [1]
> https://stackoverflow.com/questions/40303333/how-to-replicate-java-hashcode-in-c-language

Re: Samza partition hashing relative to other clients

Reply via email to