Re: application scenerio and suggested kafka setup

Guy Doulberg Sun, 27 Jan 2013 22:27:06 -0800

Hi Ahmed,

I can share with you my experience, I have built a system similar to yours.

1. If all your messages are the same, I think you should use the defaultpartitioner, so the messages will spread evenly across all thebrokers/partition combinations, unless you have a better function tospread them... I think the default partitioner is picking a combinationof partition/broker randomly.You are a bit wrong about the limitation, only one consumer within aconsumer group is allowed to consume a single (broker,partition,topic),if you have several groups you should have in each a consumer to readfrom each of these triplets. The broker (kafka server) can handle manyconnection threads. The reason for the group limitation is so only oneconsumer within a group will handle a unique stream of events and youwill not need to worry about duplications and processing twice or moreyour events. Also notice that if on consumer within a fails and otherconsumer in the group exist, the other consumers will take care of thetriplets that were consumed by the failed consumer.



2. I think you are right here, this is at least what I have been doing.

3. Didn't find, but I have been using auto-scale feature of AWS to theproducer side, I guess it will be very little effort to do it on theconsumption side. You will have to create an auto-scale group andconfigure the trigger to scale and scale down, and that should do thetrick... the rebalancing of the kafka consumer will be doneautomatically whenver a new consumer comes up or down, notice that theconsumer are bounded by the number of #broker*#partitions.


Thanks, Huy

On 01/27/2013 11:06 PM, S Ahmed wrote:

Say I create web application/service where customers signup, and they place
some javascript on their website which will then send over http a message
to my servers every time someone clicks on a link on their website.

Each customer will send to their own custom subdomain like:

customer1.example.com/api/put?linkId=1&......

Say I have 100,000 customers.

1. If all events are of the same type, what are the potential means I could
partition my topics?  Or does it not make sense to?  I'm confused as to
what I am reading, is a given kafka topic + paritition combination ONLY
allowed to be consumed by a single consumer group?  If so, why is that?
  the kafka server can only handle a single thread connecting to it??

2. I will have a java servlet that will contain my producer (each front end
server will have the same servlet that will contain a producer).  I want to
batch every x messages.  From what I understand, my producer is something I
will create using a singleton correct?

3. I want my consumers to by dynamic in size, so during peak hours I want
to fire up more nodes to  keep up with traffic, is there a production
worthy consumer daemon that I can use (or learn from) that is open sourced
somewhere?

Much appreciated!

Re: application scenerio and suggested kafka setup

Reply via email to