Gwen: Right now I'm just looking for feedback -- but yes, if folks are interested, I do plan to do that implementation work.
Daniel: Yes, that's exactly right. I haven't thought much about per-key... it does sound useful, but the implementation seems a bit more involved. Want to add it to the ticket? On Fri, Jun 12, 2015 at 7:49 AM, Daniel Schierbeck <daniel.schierb...@gmail.com> wrote: > Ben: your solutions seems to focus on partition-wide CAS. Have you > considered per-key CAS? That would make the feature more useful in my > opinion, as you'd greatly reduce the contention. > > On Fri, Jun 12, 2015 at 6:54 AM Gwen Shapira <gshap...@cloudera.com> wrote: > >> Hi Ben, >> >> Thanks for creating the ticket. Having check-and-set capability will be >> sweet :) >> Are you planning to implement this yourself? Or is it just an idea for >> the community? >> >> Gwen >> >> On Thu, Jun 11, 2015 at 8:01 PM, Ben Kirwin <b...@kirw.in> wrote: >> > As it happens, I submitted a ticket for this feature a couple days ago: >> > >> > https://issues.apache.org/jira/browse/KAFKA-2260 >> > >> > Couldn't find any existing proposals for similar things, but it's >> > certainly possible they're out there... >> > >> > On the other hand, I think you can solve your particular issue by >> > reframing the problem: treating the messages as 'requests' or >> > 'commands' instead of statements of fact. In your flight-booking >> > example, the log would correctly reflect that two different people >> > tried to book the same flight; the stream consumer would be >> > responsible for finalizing one booking, and notifying the other client >> > that their request had failed. (In-browser or by email.) >> > >> > On Wed, Jun 10, 2015 at 5:04 AM, Daniel Schierbeck >> > <daniel.schierb...@gmail.com> wrote: >> >> I've been working on an application which uses Event Sourcing, and I'd >> like >> >> to use Kafka as opposed to, say, a SQL database to store events. This >> would >> >> allow me to easily integrate other systems by having them read off the >> >> Kafka topics. >> >> >> >> I do have one concern, though: the consistency of the data can only be >> >> guaranteed if a command handler has a complete picture of all past >> events >> >> pertaining to some entity. >> >> >> >> As an example, consider an airline seat reservation system. Each >> >> reservation command issued by a user is rejected if the seat has already >> >> been taken. If the seat is available, a record describing the event is >> >> appended to the log. This works great when there's only one producer, >> but >> >> in order to scale I may need multiple producer processes. This >> introduces a >> >> race condition: two command handlers may simultaneously receive a >> command >> >> to reserver the same seat. The event log indicates that the seat is >> >> available, so each handler will append a reservation event – thus >> >> double-booking that seat! >> >> >> >> I see three ways around that issue: >> >> 1. Don't use Kafka for this. >> >> 2. Force a singler producer for a given flight. This will impact >> >> availability and make routing more complex. >> >> 3. Have a way to do optimistic locking in Kafka. >> >> >> >> The latter idea would work either on a per-key basis or globally for a >> >> partition: when appending to a partition, the producer would indicate in >> >> its request that the request should be rejected unless the current >> offset >> >> of the partition is equal to x. For the per-key setup, Kafka brokers >> would >> >> track the offset of the latest message for each unique key, if so >> >> configured. This would allow the request to specify that it should be >> >> rejected if the offset for key k is not equal to x. >> >> >> >> This way, only one of the command handlers would succeed in writing to >> >> Kafka, thus ensuring consistency. >> >> >> >> There are different levels of complexity associated with implementing >> this >> >> in Kafka depending on whether the feature would work per-partition or >> >> per-key: >> >> * For the per-partition optimistic locking, the broker would just need >> to >> >> keep track of the high water mark for each partition and reject >> conditional >> >> requests when the offset doesn't match. >> >> * For per-key locking, the broker would need to maintain an in-memory >> table >> >> mapping keys to the offset of the last message with that key. This >> should >> >> be fairly easy to maintain and recreate from the log if necessary. It >> could >> >> also be saved to disk as a snapshot from time to time in order to cut >> down >> >> the time needed to recreate the table on restart. There's a small >> >> performance penalty associated with this, but it could be opt-in for a >> >> topic. >> >> >> >> Am I the only one thinking about using Kafka like this? Would this be a >> >> nice feature to have? >>