Hi,

I was reading through the docs and would like to make use
of kafka's dynamic partition assignment feature in a way
that partitions gets distributed evenly across all consumers.
To my understanding the RoundRobinAssignor would be sufficient
and I left the problem of rebalance if a consumer dies aside
for the moment.

If I understood it correctly consumers are associated to at
least one partition. This means if you have e.g 10 consumers
you need at least 10 partitions such that the dynamic partition
assignment is possible. Is this correct ?

Given my understanding is correct and there are 1000 messages
written to topic "A" and 10 consumers using the RoundRobinAssignor
I would expect 100 messages to be consumed by each consumer.
Is this still correct ?

Based on this I setup kafka in Amazon using Amazon MSK and
went through the manual setup procedure such that there is a
chance to influence the default settings. I setup the cluster
with a max of 100partitions.

Next I wrote python code using the kafka-python module (v2.0.2)
to proof if I understood the concept before starting the
real implementation.

My producer does:

write.py
----
from kafka import KafkaProducer

producer = KafkaProducer(
    bootstrap_servers='...9092'
)
for count in range(1,100):
    producer.send('cb-request', 'Message {0}'.format(count).encode())

producer.flush()
----


My consumer does:

read.py
----
from kafka import KafkaConsumer
from kafka.coordinator.assignors.roundrobin import RoundRobinPartitionAssignor

consumer = KafkaConsumer(
    'cb-request',
    partition_assignment_strategy=[RoundRobinPartitionAssignor],
    auto_offset_reset='earliest',
    enable_auto_commit=False,
    bootstrap_servers='...9092',
    group_id='cb-request-group'
)

try:
    while(True):
        raw_messages = consumer.poll(timeout_ms=10000)
        for topic_partition, message_list in raw_messages.items():
            print(topic_partition)
            print(message_list)
        consumer.commit()
finally:
    consumer.close()
----


Now if I run this it works in the way that all produced messages
are delivered to a consumer but it never scales. Meaning if I run
multiple read.py's there is always only one that gets all the
messages.

The messages looks like this:

ConsumerRecord(topic='cb-request', partition=0, offset=3964, 
timestamp=1625085873665, timestamp_type=0, key=None, value=b'Message 99', 
headers=[], checksum=None, serialized_key_size=-1, serialized_value_size=10, 
serialized_header_size=-1)

and the partition assignment is always "partition=0"

I think I'm doing it wrong or I misunderstood the concepts.

Thus I'm writing here and kindly ask for help or to put me right

Thanks much

Regards,
Marcus
-- 
 Public Key available via: https://keybase.io/marcus_schaefer/key.asc
 keybase search marcus_schaefer
 -------------------------------------------------------
 Marcus Schäfer                 Am Unterösch 9
 Tel: +49 7562 905437           D-88316 Isny / Rohrdorf
 Germany
 -------------------------------------------------------

Attachment: signature.asc
Description: Digital signature

Reply via email to