We will be using Kafka for log message transport layer. Log have specific format (
TIMESTAMP HOSTNAME ENVIRONMENT DATACENTER MESSAGE_TYPE APPLICATION_ID PAYLOAD) There are two type of messages *HEARTBEAT* and *APPLICATION_LOG* So we have created two topics *HEARTBEATS* AND *APPLICATION_LOG*. I have following questions on consumer side: 1) We are building near-realtime monitoring application so we need to be able to drain from *HEARTBEAT *topic as soon as possible. We have 6000 servers dumping heartbeats messages at rate of ~1MB per minute from each servers ? How can I partition a message such way that consumer can drain as soon as message arrives ? Basically, we have to meet 30 seconds SLA from time message was created to consumer (we just insert message into cassandra ) for our monitoring tools ) I was thinking about partitioning by HOSTNAME as partitioning key so we can have 6000 consumers draining from each partition ? Can a new partition be created dynamically when new server is added and a consumer be started automatically ? Please recommend alternative. 2) With near-realtime requirement, if the consumer is lag behind (lets say by 5 minutes or more due to downtime or code roll or etc ), when consumer group restarts , I would like to consume first latest messages (to meet the SLA) before consuming from last offset ? How can I achieve this ? Also, I need to be able to backfill messages. Solution We have in mind: I was thinking about creating new consumer group each time we have down time and read from the largest offset configuration (to meet the 30 second SLA) ? Also, we wanted to backfill the message using previous group till we started new consumer and previous consumer group will die after backfill is complete. Is there any alternative solution for consume latest message and backfill in event of down time ? Please let us know what is being done at Linkedin or other large systems where we have SLA to meet. By the way, we will be using Kafka 0.8 version. Thanks, Bhavesh