app.id <http://app.id/> is used to identify an instance of your application. 
Each application consumes all partitions for all topics. With unique app.id 
<http://app.id/>, what you really have is two Samza jobs running simultaneously.

Sounds to me like what you’re expecting is a single Samza job with two 
containers.

Jordan

> On Feb 20, 2020, at 12:16 AM, Anoop Krishnakumar 
> <anoop.krishnaku...@gmail.com> wrote:
> 
> Hello Team,
> 
> I am testing scaling of Samza jobs in standalone deployment model. Jobs are
> deployed in Kubernetes with zookeeper as job coordinator. The input Kafka
> topic has two partitions and I am running two Samza instances. The
> configuration supplied to both instances are identical except app.id, which
> is 1,2 respectively.
> 
> The expected result was each Samza instance would process one of the two
> partitions. In reality both instances subscribes to partition 0 and 1, and
> I could see duplicate messages in the output topic. Is there anything that
> I am missing?
> 
> Samza Version: 1.3.0
> Kafka Version: 2.1
> Zookeeper Version: 3.4
> 
> Configuration:
> app.id=1
> app.name=test
> app.class=insights.stream.AlertStreamApplication
> job.default.system=kafka
> job.coordinator.factory=org.apache.samza.zk.ZkJobCoordinatorFactory
> task.name.grouper.factory=org.apache.samza.container.grouper.task.GroupByContainerIdsFactory
> # Task Checkpoint
> task.checkpoint.factory=org.apache.samza.checkpoint.kafka.KafkaCheckpointManagerFactory
> job.coordinator.zk.connect=<ZK IP>:2181
> systems.kafka.producer.bootstrap.servers=<Kafka IP>:9092
> # Kafka System
> systems.kafka.samza.factory=org.apache.samza.system.kafka.KafkaSystemFactory
> systems.kafka.default.stream.replication.factor=1
> # Input Streams
> streams.alert.samza.system=kafka
> streams.alert.samza.physical.name=alert
> # Output Streams
> streams.events.samza.system=kafka
> streams.events.samza.physical.name=events
> 
> Thanks,
> Anoop

Reply via email to