Hi, I have used Apache Kafka in conjunction with Spark as a messaging source. This rather dated diagram describes it
I have two physical hosts each 64 GB, running RHES 7.6, these are called rhes75 and rhes76 respectively. The Zookeeper version is 3.7.1 and kafka version is 3.4.0 [image: image.png] I have a topic md -> MarketData that has been defined as below kafka-topics.sh --create --bootstrap-server rhes75:9092,rhes75:9093,rhes75:9094,rhes76:9092,rhes76:9093,rhes76:9094,rhes76:9095,rhes76:9096, rhes76:9097 --replication-factor 9 --partitions 9 --topic md kafka-topics.sh --describe --bootstrap-server rhes75:9092,rhes75:9093,rhes75:9094,rhes76:9092,rhes76:9093,rhes76:9094,rhes76:9095,rhes76:9096, rhes76:9097 --topic md This is working fine Topic: md TopicId: UfQly87bQPCbVKoH-PQheg PartitionCount: 9 ReplicationFactor: 9 Configs: segment.bytes=1073741824 Topic: md Partition: 0 Leader: 12 Replicas: 12,10,8,2,9,11,1,7,3 Isr: 10,1,9,2,12,7,3,11,8 Topic: md Partition: 1 Leader: 9 Replicas: 9,8,2,12,11,1,7,3,10 Isr: 10,1,9,2,12,7,3,11,8 Topic: md Partition: 2 Leader: 11 Replicas: 11,2,12,9,1,7,3,10,8 Isr: 10,1,9,2,12,7,3,11,8 Topic: md Partition: 3 Leader: 1 Replicas: 1,12,9,11,7,3,10,8,2 Isr: 10,1,9,2,12,7,3,11,8 Topic: md Partition: 4 Leader: 7 Replicas: 7,9,11,1,3,10,8,2,12 Isr: 10,1,9,2,12,7,3,11,8 Topic: md Partition: 5 Leader: 3 Replicas: 3,11,1,7,10,8,2,12,9 Isr: 10,1,9,2,12,7,3,11,8 Topic: md Partition: 6 Leader: 10 Replicas: 10,1,7,3,8,2,12,9,11 Isr: 10,1,9,2,12,7,3,11,8 Topic: md Partition: 7 Leader: 8 Replicas: 8,7,3,10,2,12,9,11,1 Isr: 10,1,9,2,12,7,3,11,8 Topic: md Partition: 8 Leader: 2 Replicas: 2,3,10,8,12,9,11,1,7 Isr: 10,1,9,2,12,7,3,11,8 However, I have a number of questions 1. Does having 9 partitions with 9 replication factors make sense here? 2. As I understand the parallelism is equal to the number of partitions for a topic. 3. Kafka only provides a total order over messages *within a partition*, not between different partitions in a topic and in this case I have one topic 4. Data within a Partition will be stored in the order in which it is written, therefore, data read from a Partition will be read in order for that partition? 5. Finally if I want to get messages in order across multiple all 9 partitionss, then I need to group messages with a key, so that messages with the same key go to the same partition and within that partition the messages are ordered Thanks *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.