Gwen: Right now I'm just looking for feedback -- but yes, if folks are
interested, I do plan to do that implementation work.

Daniel: Yes, that's exactly right. I haven't thought much about
per-key... it does sound useful, but the implementation seems a bit
more involved. Want to add it to the ticket?

On Fri, Jun 12, 2015 at 7:49 AM, Daniel Schierbeck
<daniel.schierb...@gmail.com> wrote:
> Ben: your solutions seems to focus on partition-wide CAS. Have you
> considered per-key CAS? That would make the feature more useful in my
> opinion, as you'd greatly reduce the contention.
>
> On Fri, Jun 12, 2015 at 6:54 AM Gwen Shapira <gshap...@cloudera.com> wrote:
>
>> Hi Ben,
>>
>> Thanks for creating the ticket. Having check-and-set capability will be
>> sweet :)
>> Are you planning to implement this yourself? Or is it just an idea for
>> the community?
>>
>> Gwen
>>
>> On Thu, Jun 11, 2015 at 8:01 PM, Ben Kirwin <b...@kirw.in> wrote:
>> > As it happens, I submitted a ticket for this feature a couple days ago:
>> >
>> > https://issues.apache.org/jira/browse/KAFKA-2260
>> >
>> > Couldn't find any existing proposals for similar things, but it's
>> > certainly possible they're out there...
>> >
>> > On the other hand, I think you can solve your particular issue by
>> > reframing the problem: treating the messages as 'requests' or
>> > 'commands' instead of statements of fact. In your flight-booking
>> > example, the log would correctly reflect that two different people
>> > tried to book the same flight; the stream consumer would be
>> > responsible for finalizing one booking, and notifying the other client
>> > that their request had failed. (In-browser or by email.)
>> >
>> > On Wed, Jun 10, 2015 at 5:04 AM, Daniel Schierbeck
>> > <daniel.schierb...@gmail.com> wrote:
>> >> I've been working on an application which uses Event Sourcing, and I'd
>> like
>> >> to use Kafka as opposed to, say, a SQL database to store events. This
>> would
>> >> allow me to easily integrate other systems by having them read off the
>> >> Kafka topics.
>> >>
>> >> I do have one concern, though: the consistency of the data can only be
>> >> guaranteed if a command handler has a complete picture of all past
>> events
>> >> pertaining to some entity.
>> >>
>> >> As an example, consider an airline seat reservation system. Each
>> >> reservation command issued by a user is rejected if the seat has already
>> >> been taken. If the seat is available, a record describing the event is
>> >> appended to the log. This works great when there's only one producer,
>> but
>> >> in order to scale I may need multiple producer processes. This
>> introduces a
>> >> race condition: two command handlers may simultaneously receive a
>> command
>> >> to reserver the same seat. The event log indicates that the seat is
>> >> available, so each handler will append a reservation event – thus
>> >> double-booking that seat!
>> >>
>> >> I see three ways around that issue:
>> >> 1. Don't use Kafka for this.
>> >> 2. Force a singler producer for a given flight. This will impact
>> >> availability and make routing more complex.
>> >> 3. Have a way to do optimistic locking in Kafka.
>> >>
>> >> The latter idea would work either on a per-key basis or globally for a
>> >> partition: when appending to a partition, the producer would indicate in
>> >> its request that the request should be rejected unless the current
>> offset
>> >> of the partition is equal to x. For the per-key setup, Kafka brokers
>> would
>> >> track the offset of the latest message for each unique key, if so
>> >> configured. This would allow the request to specify that it should be
>> >> rejected if the offset for key k is not equal to x.
>> >>
>> >> This way, only one of the command handlers would succeed in writing to
>> >> Kafka, thus ensuring consistency.
>> >>
>> >> There are different levels of complexity associated with implementing
>> this
>> >> in Kafka depending on whether the feature would work per-partition or
>> >> per-key:
>> >> * For the per-partition optimistic locking, the broker would just need
>> to
>> >> keep track of the high water mark for each partition and reject
>> conditional
>> >> requests when the offset doesn't match.
>> >> * For per-key locking, the broker would need to maintain an in-memory
>> table
>> >> mapping keys to the offset of the last message with that key. This
>> should
>> >> be fairly easy to maintain and recreate from the log if necessary. It
>> could
>> >> also be saved to disk as a snapshot from time to time in order to cut
>> down
>> >> the time needed to recreate the table on restart. There's a small
>> >> performance penalty associated with this, but it could be opt-in for a
>> >> topic.
>> >>
>> >> Am I the only one thinking about using Kafka like this? Would this be a
>> >> nice feature to have?
>>

Reply via email to