1) We are building near-realtime monitoring application so we need to be able to drain from *HEARTBEAT *topic as soon as possible. We have 6000 servers dumping heartbeats messages at rate of ~1MB per minute from each servers ? How can I partition a message such way that consumer can drain as soon as message arrives ? Basically, we have to meet 30 seconds SLA from time message was created to consumer (we just insert message into cassandra ) for our monitoring tools ) I was thinking about partitioning by HOSTNAME as partitioning key so we can have 6000 consumers draining from each partition ? Can a new partition be created dynamically when new server is added and a consumer be started automatically ? Please recommend alternative.
6000 servers sending 1MB/min isn't a lot of data. The number of partitions for the topic would depend largely on the throughput of your consumer, since that involves writing to Cassandra. Since the # of partitions can be increased on the fly without downtime, I recommend you start small, measure your consumer's throughput and then based on that increase the # of partitions to match the # of consumers you would have to deploy, to easily keep up with the load. The 30 second SLA can be tricky since you are batching for 1 min on the producers, so you will see a delay of 1min on every batch. If you set the right configs for batching on the producer side, you could bring down the end-to-end latency down to few 100s of ms. 2) With near-realtime requirement, if the consumer is lag behind (lets say by 5 minutes or more due to downtime or code roll or etc ), when consumer group restarts , I would like to consume first latest messages (to meet the SLA) before consuming from last offset ? How can I achieve this ? Also, I need to be able to backfill messages. It seems like the new consumer group starting from the latest message will work, provided you have a way to stop the old consumer group once it has caught up. You might have to place this logic in your application code. However, this will be much easier to do once we have our new consumer APIs<http://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc/doc/kafka/clients/consumer/KafkaConsumer.html>since the new APIs provide an easy way to "seek" to certain offsets and provide enough information for you know to when the consumer has caught up to a certain offset. I also highly recommend using the soon-to-be-released 0.8.1.1 instead of 0.8. Hope that helps. Thanks, Neha On Thu, Apr 10, 2014 at 8:47 AM, Bhavesh Mistry <mistry.p.bhav...@gmail.com>wrote: > We will be using Kafka for log message transport layer. Log have specific > format ( > > TIMESTAMP HOSTNAME ENVIRONMENT DATACENTER MESSAGE_TYPE > APPLICATION_ID PAYLOAD) > > > > There are two type of messages *HEARTBEAT* and *APPLICATION_LOG* > > So we have created two topics *HEARTBEATS* AND *APPLICATION_LOG*. > > > > > > I have following questions on consumer side: > > > > 1) We are building near-realtime monitoring application so we need to be > able to drain from *HEARTBEAT *topic as soon as possible. We have 6000 > servers dumping heartbeats messages at rate of ~1MB per minute from each > servers ? How can I partition a message such way that consumer can drain > as soon as message arrives ? Basically, we have to meet 30 seconds SLA > from time message was created to consumer (we just insert message into > cassandra ) for our monitoring tools ) I was thinking about partitioning > by HOSTNAME as partitioning key so we can have 6000 consumers draining > from each partition ? Can a new partition be created dynamically when new > server is added and a consumer be started automatically ? Please recommend > alternative. > > > > 2) With near-realtime requirement, if the consumer is lag behind (lets > say by 5 minutes or more due to downtime or code roll or etc ), when > consumer group restarts , I would like to consume first latest messages (to > meet the SLA) before consuming from last offset ? How can I achieve this ? > Also, I need to be able to backfill messages. > > > > Solution We have in mind: > > I was thinking about creating new consumer group each time we have down > time and read from the largest offset configuration (to meet the 30 second > SLA) ? Also, we wanted to backfill the message using previous group till > we started new consumer and previous consumer group will die after > backfill is complete. > > > > Is there any alternative solution for consume latest message and backfill > in event of down time ? Please let us know what is being done at Linkedin > or other large systems where we have SLA to meet. > > > By the way, we will be using Kafka 0.8 version. > > Thanks, > > > Bhavesh >