Ben: your solutions seems to focus on partition-wide CAS. Have you considered per-key CAS? That would make the feature more useful in my opinion, as you'd greatly reduce the contention.
On Fri, Jun 12, 2015 at 6:54 AM Gwen Shapira <gshap...@cloudera.com> wrote: > Hi Ben, > > Thanks for creating the ticket. Having check-and-set capability will be > sweet :) > Are you planning to implement this yourself? Or is it just an idea for > the community? > > Gwen > > On Thu, Jun 11, 2015 at 8:01 PM, Ben Kirwin <b...@kirw.in> wrote: > > As it happens, I submitted a ticket for this feature a couple days ago: > > > > https://issues.apache.org/jira/browse/KAFKA-2260 > > > > Couldn't find any existing proposals for similar things, but it's > > certainly possible they're out there... > > > > On the other hand, I think you can solve your particular issue by > > reframing the problem: treating the messages as 'requests' or > > 'commands' instead of statements of fact. In your flight-booking > > example, the log would correctly reflect that two different people > > tried to book the same flight; the stream consumer would be > > responsible for finalizing one booking, and notifying the other client > > that their request had failed. (In-browser or by email.) > > > > On Wed, Jun 10, 2015 at 5:04 AM, Daniel Schierbeck > > <daniel.schierb...@gmail.com> wrote: > >> I've been working on an application which uses Event Sourcing, and I'd > like > >> to use Kafka as opposed to, say, a SQL database to store events. This > would > >> allow me to easily integrate other systems by having them read off the > >> Kafka topics. > >> > >> I do have one concern, though: the consistency of the data can only be > >> guaranteed if a command handler has a complete picture of all past > events > >> pertaining to some entity. > >> > >> As an example, consider an airline seat reservation system. Each > >> reservation command issued by a user is rejected if the seat has already > >> been taken. If the seat is available, a record describing the event is > >> appended to the log. This works great when there's only one producer, > but > >> in order to scale I may need multiple producer processes. This > introduces a > >> race condition: two command handlers may simultaneously receive a > command > >> to reserver the same seat. The event log indicates that the seat is > >> available, so each handler will append a reservation event – thus > >> double-booking that seat! > >> > >> I see three ways around that issue: > >> 1. Don't use Kafka for this. > >> 2. Force a singler producer for a given flight. This will impact > >> availability and make routing more complex. > >> 3. Have a way to do optimistic locking in Kafka. > >> > >> The latter idea would work either on a per-key basis or globally for a > >> partition: when appending to a partition, the producer would indicate in > >> its request that the request should be rejected unless the current > offset > >> of the partition is equal to x. For the per-key setup, Kafka brokers > would > >> track the offset of the latest message for each unique key, if so > >> configured. This would allow the request to specify that it should be > >> rejected if the offset for key k is not equal to x. > >> > >> This way, only one of the command handlers would succeed in writing to > >> Kafka, thus ensuring consistency. > >> > >> There are different levels of complexity associated with implementing > this > >> in Kafka depending on whether the feature would work per-partition or > >> per-key: > >> * For the per-partition optimistic locking, the broker would just need > to > >> keep track of the high water mark for each partition and reject > conditional > >> requests when the offset doesn't match. > >> * For per-key locking, the broker would need to maintain an in-memory > table > >> mapping keys to the offset of the last message with that key. This > should > >> be fairly easy to maintain and recreate from the log if necessary. It > could > >> also be saved to disk as a snapshot from time to time in order to cut > down > >> the time needed to recreate the table on restart. There's a small > >> performance penalty associated with this, but it could be opt-in for a > >> topic. > >> > >> Am I the only one thinking about using Kafka like this? Would this be a > >> nice feature to have? >