The first and 3rd points here aren't very fair -- they apply to all data systems. Systems downstream of your database can lose data in the same way; the database retention policy expires old data, downstream fails, and back to the tapes you must go. Likewise with 3, a bug in any ETL system can cause problems. Also not specific to streaming in general or Kafka/Flink specifically.
I'm much more curious about the 2nd claim. The whole point of high availability in these systems is to not lose data during failure. The post's author is not specific on any of these points, but just like I look to a distributed database community to prove to me it doesn't lose data in these corner cases, so too do I expect Kafka to prove it is resilient. In the absence of software formally proven correct, I look to empirical evidence in the form of chaos monkey type tests. On Wednesday, November 11, 2015, Welly Tambunan <if05...@gmail.com> wrote: > Hi Stephan, > > > Thanks for your response. > > > We are trying to justify whether it's enough to use Kappa Architecture > with Flink. This more about resiliency and message lost issue etc. > > The article is worry about message lost even if you are using Kafka. > > No matter the message queue or broker you rely on whether it be RabbitMQ, > JMS, ActiveMQ, Websphere, MSMQ and yes even Kafka you can lose messages in > any of the following ways: > > - A downstream system from the broker can have data loss > - All message queues today can lose already acknowledged messages > during failover or leader election. > - A bug can send the wrong messages to the wrong systems. > > Cheers > > On Wed, Nov 11, 2015 at 4:13 PM, Stephan Ewen <se...@apache.org > <javascript:_e(%7B%7D,'cvml','se...@apache.org');>> wrote: > >> Hi! >> >> Can you explain a little more what you want to achieve? Maybe then we can >> give a few more comments... >> >> I briefly read through some of the articles you linked, but did not quite >> understand their train of thoughts. >> For example, letting Tomcat write to Cassandra directly, and to Kafka, >> might just be redundant. Why not let the streaming job that reads the Kafka >> queue >> move the data to Cassandra as one of its results? Further more, durable >> storing the sequence of events is exactly what Kafka does, but the article >> suggests to use Cassandra for that, which I find very counter intuitive. >> It looks a bit like the suggested approach is only adopting streaming for >> half the task. >> >> Greetings, >> Stephan >> >> >> On Tue, Nov 10, 2015 at 7:49 AM, Welly Tambunan <if05...@gmail.com >> <javascript:_e(%7B%7D,'cvml','if05...@gmail.com');>> wrote: >> >>> Hi All, >>> >>> I read a couple of article about Kappa and Lambda Architecture. >>> >>> >>> http://www.confluent.io/blog/real-time-stream-processing-the-next-step-for-apache-flink/ >>> >>> I'm convince that Flink will simplify this one with streaming. >>> >>> However i also stumble upon this blog post that has valid argument to >>> have a system of record storage ( event sourcing ) and finally lambda >>> architecture is appear at the solution. Basically it will write twice to >>> Queuing system and C* for safety. System of record here is basically >>> storing the event (delta). >>> >>> [image: Inline image 1] >>> >>> >>> https://lostechies.com/ryansvihla/2015/09/17/event-sourcing-and-system-of-record-sane-distributed-development-in-the-modern-era-2/ >>> >>> Another approach is about lambda architecture for maintaining the >>> correctness of the system. >>> >>> >>> https://lostechies.com/ryansvihla/2015/09/17/real-time-analytics-with-spark-streaming-and-cassandra/ >>> >>> >>> Given that he's using Spark for the streaming processor, do we have to >>> do the same thing with Apache Flink ? >>> >>> >>> >>> Cheers >>> -- >>> Welly Tambunan >>> Triplelands >>> >>> http://weltam.wordpress.com >>> http://www.triplelands.com <http://www.triplelands.com/blog/> >>> >> >> > > > -- > Welly Tambunan > Triplelands > > http://weltam.wordpress.com > http://www.triplelands.com <http://www.triplelands.com/blog/> >