I've been working on an application which uses Event Sourcing, and I'd like
to use Kafka as opposed to, say, a SQL database to store events. This would
allow me to easily integrate other systems by having them read off the
Kafka topics.

I do have one concern, though: the consistency of the data can only be
guaranteed if a command handler has a complete picture of all past events
pertaining to some entity.

As an example, consider an airline seat reservation system. Each
reservation command issued by a user is rejected if the seat has already
been taken. If the seat is available, a record describing the event is
appended to the log. This works great when there's only one producer, but
in order to scale I may need multiple producer processes. This introduces a
race condition: two command handlers may simultaneously receive a command
to reserver the same seat. The event log indicates that the seat is
available, so each handler will append a reservation event – thus
double-booking that seat!

I see three ways around that issue:
1. Don't use Kafka for this.
2. Force a singler producer for a given flight. This will impact
availability and make routing more complex.
3. Have a way to do optimistic locking in Kafka.

The latter idea would work either on a per-key basis or globally for a
partition: when appending to a partition, the producer would indicate in
its request that the request should be rejected unless the current offset
of the partition is equal to x. For the per-key setup, Kafka brokers would
track the offset of the latest message for each unique key, if so
configured. This would allow the request to specify that it should be
rejected if the offset for key k is not equal to x.

This way, only one of the command handlers would succeed in writing to
Kafka, thus ensuring consistency.

There are different levels of complexity associated with implementing this
in Kafka depending on whether the feature would work per-partition or
per-key:
* For the per-partition optimistic locking, the broker would just need to
keep track of the high water mark for each partition and reject conditional
requests when the offset doesn't match.
* For per-key locking, the broker would need to maintain an in-memory table
mapping keys to the offset of the last message with that key. This should
be fairly easy to maintain and recreate from the log if necessary. It could
also be saved to disk as a snapshot from time to time in order to cut down
the time needed to recreate the table on restart. There's a small
performance penalty associated with this, but it could be opt-in for a
topic.

Am I the only one thinking about using Kafka like this? Would this be a
nice feature to have?

Reply via email to