Hi there, I'm new to Samza/Kafka and we're evaluating Samza to see whether
it would be a good fit for our application. I just had a few questions
about how partitioning works.
I understand there is a limitation on the number of topics we can create
[1], and I was wondering, if we need more than, say 10K topics, would it be
a better idea to use partitioning instead? or would the same limits apply?
i.e. would having 1 topic with 10k partitions produce the same performance
issues as having 10k topics with 1 partition each?

If we can overcome the topics limitation by creating more partitions, we'd
like to be able to divide up our stream messages by client ID. is it
possible to group partitions so that we have a set of partitions that
contain data from a certain client and another set of partitions for
another client, within the same topic?

For example, we might have a stream partition 'A' (for clientID A) and a
corresponding task 'a' that processes messages from partition 'A', and a
partition B (for client B) and a corresponding task, 'b' that processes
messages from stream partition 'B'. Our problem though, is that, we'd like
for task 'a' to only process messages from stream A and never from stream
B, since task 'a' may contain local state that applies specifically to
stream A. Would this be possible?

Maybe I'm not understanding how Samza works, but I'm hoping someone can
help me clarify. Thanks in advance for your help.

Susan



[1]
http://grokbase.com/t/kafka/users/133v60ng6v/limit-on-number-of-kafka-topic

Reply via email to