> Would there be any advantage to using the kafka connect method?

The advantage is to decouple the data processing (which you do in your app)
from the responsibility of making the processing results available to one
or more downstream systems, like Cassandra.

For example, what will your application (that uses Kafka Streams) do if
Cassandra is unavailable or slow?  Will you retry, and if so -- for how
long?  Retrying writes to external systems means that the time spent doing
this will not be spent on processing the next input records, thus
increasing the latency of your processing topology.  At this point you have
coupled your app to the availability of both Kafka and Cassandra.
Upgrading or doing maintenance on Cassandra will now also mean there's
potential impact on your app.



On Thu, Oct 20, 2016 at 9:39 AM, Ali Akhtar <ali.rac...@gmail.com> wrote:

> Michael,
>
> Would there be any advantage to using the kafka connect method? Seems like
> it'd just add an extra step of overhead?
>
> On Thu, Oct 20, 2016 at 12:35 PM, Michael Noll <mich...@confluent.io>
> wrote:
>
> > Ali,
> >
> > my main feedback is similar to what Eno and Dave have already said.  In
> > your situation, options like these are what you'd currently need to do
> > since you are writing directly from your Kafka Stream app to Cassandra,
> > rather than writing from your app to Kafka and then using Kafka Connect
> to
> > ingest into Cassandra.
> >
> >
> >
> > On Wed, Oct 19, 2016 at 11:03 PM, Ali Akhtar <ali.rac...@gmail.com>
> wrote:
> >
> > > Yeah, I did think to use that method, but as you said, it writes to a
> > dummy
> > > output topic, which means I'd have to put in magic code just for the
> > tests
> > > to pass (the actual code writes to cassandra and not to a dummy topic).
> > >
> > >
> > > On Thu, Oct 20, 2016 at 2:00 AM, Tauzell, Dave <
> > > dave.tauz...@surescripts.com
> > > > wrote:
> > >
> > > > For similar queue related tests we put the check in a loop.  Check
> > every
> > > > second until either the result is found or a timeout  happens.
> > > >
> > > > -Dave
> > > >
> > > > -----Original Message-----
> > > > From: Ali Akhtar [mailto:ali.rac...@gmail.com]
> > > > Sent: Wednesday, October 19, 2016 3:38 PM
> > > > To: users@kafka.apache.org
> > > > Subject: How to block tests of Kafka Streams until messages
> processed?
> > > >
> > > > I'm using Kafka Streams, and I'm attempting to write integration
> tests
> > > for
> > > > a stream processor.
> > > >
> > > > The processor listens to a topic, processes incoming messages, and
> > writes
> > > > some data to Cassandra tables.
> > > >
> > > > I'm attempting to write a test which produces some test data, and
> then
> > > > checks whether or not the expected data was written to Cassandra.
> > > >
> > > > It looks like this:
> > > >
> > > > - Step 1: Produce data in the test
> > > > - Step 2: Kafka stream gets triggered
> > > > - Step 3: Test checks whether cassandra got populated
> > > >
> > > > The problem is, Step 3 is occurring before Step 2, and as a result,
> the
> > > > test fails as it doesn't find the data in the table.
> > > >
> > > > I've resolved this by adding a Thread.sleep(2000) call after Step 1,
> > > which
> > > > ensures that Step 2 gets triggered before Step 3.
> > > >
> > > > However, I'm wondering if there's a more reliable way of blocking the
> > > test
> > > > until Kafka stream processor gets triggered?
> > > >
> > > > At the moment, I'm using 1 thread for the processor. If I increase
> that
> > > to
> > > > 2 threads, will that achieve what I want?
> > > > This e-mail and any files transmitted with it are confidential, may
> > > > contain sensitive information, and are intended solely for the use of
> > the
> > > > individual or entity to whom they are addressed. If you have received
> > > this
> > > > e-mail in error, please notify the sender by reply e-mail immediately
> > and
> > > > destroy all copies of the e-mail and any attachments.
> > > >
> > >
> >
>

Reply via email to