Hi Tauzell,

Yeah our users want to query, do aggregations on Elastic Search directly
and we cannot have inconsistent data  because say the writes didn't make it
into Cassandra but made it to Elastic search then a simple aggregations
like count will lead to a wrong answer but again as @Hans pointed out this
is no longer a Kafka question and also your solution has merits in its own
way which I really appreciate it! your solution does make writes faster and
probably some performance penalty on the read side give repairs happen
during the read stage in Cassandra (We could check in both but since our
users query elastic search directly there is no way for us to check it in
Cassandra we could go with your solution as well).

Basically, we use ES as an index for Cassandra since secondary indexes in
Cassandra (including the latest implementation SASI) doesn't work with our
use case since we have high cardinality columns (which means every row in a
column is unique so index on a high cardinality column is not very
efficient given the underlying data structure used by SASI, but with
inverted index which is used by ES is much faster).

We do use Apache Spark along with Cassandra and I am trying to explore
Succint http://succinct.cs.berkeley.edu/wp/wordpress/ and if everything
works out with Succint we can get rid of elastic search. The only thing
that I worry and still testing with Spark, Cassandra and Succint is whether
If the aggregations/computations of a column or search on particular
Cassandra field/column  can happen in real time given a big dataset (with
ES it does so the goal is to see if we can get somewhere close or perform
even better).

Thanks!



On Mon, Nov 7, 2016 at 8:57 AM, Tauzell, Dave <dave.tauz...@surescripts.com>
wrote:

> Here is a scenario where this could be useful:
>
>    Add the kafka offset as a field on the record in both Cassandra and
> Elasticsearch
>
> Now when you get search results from Elastic search and look up details in
> Cassandra you can know if they come from the same kafka record.   If you
> can use the offset as part of the Cassandra Partition key, or as a
> clustering key, then you could specifically retrieve a version of the
> record from Cassandra that matches Elasticsearch (assuming it made it).
>
> If your real goal is to guarantee that the two datasets always have the
> same set of messages from Kafka ... I don't think this is possible.
>
> -Dave
>
> -----Original Message-----
> From: kant kodali [mailto:kanth...@gmail.com]
> Sent: Monday, November 7, 2016 10:48 AM
> To: users@kafka.apache.org
> Subject: Re: is there a way to make sure two consumers receive the same
> message from the broker?
>
> Hi AmitHossein,
>
> I still don't see how that guarantees consistency at any given time. other
> words how do I know at time X the data in Cassandra and ES are the same.
>
> Thanks
>
>
> On Mon, Nov 7, 2016 at 3:26 AM, AmirHossein Roozbahany <
> diver...@outlook.com
> > wrote:
>
> > Hi
> >
> > Can you use elasticsearch _version field as cassandra's
> > writetime?(_version is strictly increasing, cassandra uses writetime
> > for applying LWW, so last write in elasticsearch will always win)
> >
> > It needs no transaction and makes databases convergent.
> >
> >
> > ________________________________
> > From: kant kodali <kanth...@gmail.com>
> > Sent: Monday, November 7, 2016 3:08 AM
> > To: users@kafka.apache.org
> > Subject: Re: is there a way to make sure two consumers receive the
> > same message from the broker?
> >
> > Hi Hans,
> >
> > The two storages we use are Cassandra and Elastic search and they are
> > on the same datacenter for now.
> > The Programming Language we use is Java and OS would be Ubuntu or CentOS.
> > We get messages in JSON format so we insert into Elastic Search
> > directly and for Cassandra we transform JSON message into appropriate
> > model so we could insert into a Cassandra table.
> > The rate we currently get is about 100K/sec which is awesome but I am
> > pretty sure this will go down once when we implement 2PC or
> > transactional writes.
> >
> > Thanks,
> > kant
> >
> This e-mail and any files transmitted with it are confidential, may
> contain sensitive information, and are intended solely for the use of the
> individual or entity to whom they are addressed. If you have received this
> e-mail in error, please notify the sender by reply e-mail immediately and
> destroy all copies of the e-mail and any attachments.
>

Reply via email to