Hello Team,

I am testing scaling of Samza jobs in standalone deployment model. Jobs are
deployed in Kubernetes with zookeeper as job coordinator. The input Kafka
topic has two partitions and I am running two Samza instances. The
configuration supplied to both instances are identical except app.id, which
is 1,2 respectively.

The expected result was each Samza instance would process one of the two
partitions. In reality both instances subscribes to partition 0 and 1, and
I could see duplicate messages in the output topic. Is there anything that
I am missing?

Samza Version: 1.3.0
Kafka Version: 2.1
Zookeeper Version: 3.4

Configuration:
app.id=1
app.name=test
app.class=insights.stream.AlertStreamApplication
job.default.system=kafka
job.coordinator.factory=org.apache.samza.zk.ZkJobCoordinatorFactory
task.name.grouper.factory=org.apache.samza.container.grouper.task.GroupByContainerIdsFactory
# Task Checkpoint
task.checkpoint.factory=org.apache.samza.checkpoint.kafka.KafkaCheckpointManagerFactory
job.coordinator.zk.connect=<ZK IP>:2181
systems.kafka.producer.bootstrap.servers=<Kafka IP>:9092
# Kafka System
systems.kafka.samza.factory=org.apache.samza.system.kafka.KafkaSystemFactory
systems.kafka.default.stream.replication.factor=1
# Input Streams
streams.alert.samza.system=kafka
streams.alert.samza.physical.name=alert
# Output Streams
streams.events.samza.system=kafka
streams.events.samza.physical.name=events

Thanks,
Anoop

Reply via email to