app.id <http://app.id/> is used to identify an instance of your application. Each application consumes all partitions for all topics. With unique app.id <http://app.id/>, what you really have is two Samza jobs running simultaneously.
Sounds to me like what you’re expecting is a single Samza job with two containers. Jordan > On Feb 20, 2020, at 12:16 AM, Anoop Krishnakumar > <anoop.krishnaku...@gmail.com> wrote: > > Hello Team, > > I am testing scaling of Samza jobs in standalone deployment model. Jobs are > deployed in Kubernetes with zookeeper as job coordinator. The input Kafka > topic has two partitions and I am running two Samza instances. The > configuration supplied to both instances are identical except app.id, which > is 1,2 respectively. > > The expected result was each Samza instance would process one of the two > partitions. In reality both instances subscribes to partition 0 and 1, and > I could see duplicate messages in the output topic. Is there anything that > I am missing? > > Samza Version: 1.3.0 > Kafka Version: 2.1 > Zookeeper Version: 3.4 > > Configuration: > app.id=1 > app.name=test > app.class=insights.stream.AlertStreamApplication > job.default.system=kafka > job.coordinator.factory=org.apache.samza.zk.ZkJobCoordinatorFactory > task.name.grouper.factory=org.apache.samza.container.grouper.task.GroupByContainerIdsFactory > # Task Checkpoint > task.checkpoint.factory=org.apache.samza.checkpoint.kafka.KafkaCheckpointManagerFactory > job.coordinator.zk.connect=<ZK IP>:2181 > systems.kafka.producer.bootstrap.servers=<Kafka IP>:9092 > # Kafka System > systems.kafka.samza.factory=org.apache.samza.system.kafka.KafkaSystemFactory > systems.kafka.default.stream.replication.factor=1 > # Input Streams > streams.alert.samza.system=kafka > streams.alert.samza.physical.name=alert > # Output Streams > streams.events.samza.system=kafka > streams.events.samza.physical.name=events > > Thanks, > Anoop