Hi, after having read http://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying, I am considering Kafka for an application build around CQRS and Event Sourcing.
Disclaimer: I read the documentation but do not have any experience with Kafka at this time. In that constellation, the queried state is build by applying every events from the beginning. It is also important: - that all events are ordered, at least per entity - that all events are stored (no deletion) OR that events are compacted in such a way that the final state stays the same Questions: - I read that Kafka can delete events based on time or on disk usage. Is it possible to completely deactivate events deletion? (without using log compaction, this is my next questions) Kafka can also compact log ( https://cwiki.apache.org/confluence/display/KAFKA/Log+Compaction and http://kafka.apache.org/documentation.html#compaction). - How can we structure all events to that the final state stays the same? For example, if I have the following events: - create user 456 - for user 456, set email "email1@dns" - for user 456, set email "email2@dns" The log compaction should keep the user creation and the last email setting. Should I set events like that: - id "user-456-creation": create user 456 - id "user-456-email-set": for user 456, set email "email1@dns" - id "user-456-email-set": for user 456, set email "email2@dns" - Can we provide a custom log compaction logic? If somebody is using Kafka for this purpose, I'd be glad to hear some return of experience. Cheers, Yann