I've been working on an application which uses Event Sourcing, and I'd like to use Kafka as opposed to, say, a SQL database to store events. This would allow me to easily integrate other systems by having them read off the Kafka topics.
I do have one concern, though: the consistency of the data can only be guaranteed if a command handler has a complete picture of all past events pertaining to some entity. As an example, consider an airline seat reservation system. Each reservation command issued by a user is rejected if the seat has already been taken. If the seat is available, a record describing the event is appended to the log. This works great when there's only one producer, but in order to scale I may need multiple producer processes. This introduces a race condition: two command handlers may simultaneously receive a command to reserver the same seat. The event log indicates that the seat is available, so each handler will append a reservation event – thus double-booking that seat! I see three ways around that issue: 1. Don't use Kafka for this. 2. Force a singler producer for a given flight. This will impact availability and make routing more complex. 3. Have a way to do optimistic locking in Kafka. The latter idea would work either on a per-key basis or globally for a partition: when appending to a partition, the producer would indicate in its request that the request should be rejected unless the current offset of the partition is equal to x. For the per-key setup, Kafka brokers would track the offset of the latest message for each unique key, if so configured. This would allow the request to specify that it should be rejected if the offset for key k is not equal to x. This way, only one of the command handlers would succeed in writing to Kafka, thus ensuring consistency. There are different levels of complexity associated with implementing this in Kafka depending on whether the feature would work per-partition or per-key: * For the per-partition optimistic locking, the broker would just need to keep track of the high water mark for each partition and reject conditional requests when the offset doesn't match. * For per-key locking, the broker would need to maintain an in-memory table mapping keys to the offset of the last message with that key. This should be fairly easy to maintain and recreate from the log if necessary. It could also be saved to disk as a snapshot from time to time in order to cut down the time needed to recreate the table on restart. There's a small performance penalty associated with this, but it could be opt-in for a topic. Am I the only one thinking about using Kafka like this? Would this be a nice feature to have?