Re: Issues in moving data from cassandra to elasticsearch in java.

Eric Stevens Tue, 25 Nov 2014 07:16:33 -0800

Consider adding log_bucket timestamp, and then indexing that column.  Your
data loader can SELECT * FROM logs WHERE log_bucket = ?.  The value you
supply there would be the timestamp log bucket you're processing - in your
case logged_at % 5.

However, I'll caution against writing data to Cassandra and then trying to
reliably read it back immediately after.  You're likely to miss values this
way due to eventual consistency unless you read at CL_ALL.  But then your
data loader will break whenever you have any node offline.  Writing then
immediately reading data is a typical antipattern in any eventually
consistent system.

If using DataStax Java Driver you can use CL_ALL with
DowngradingConsistencyRetryPolicy and you would at least strike a nice
balance between reasonably strong consistency and loss of resiliency from
CL_ALL (but when you have a node offline, your load process may get
significantly slower).  This would mitigate but not eliminate the
antipattern.

On Tue Nov 25 2014 at 2:11:36 AM Vinod Joseph <vinodjosep...@gmail.com>
wrote:

> Hi,
>
>             I am working on a java plugin which moves data from cassandra
> to elasticsearch. This plugin must run in the server for every 5 seconds.
> The data is getting moved, but the issue is that every time the plugin
> runs(ie after every 5 seconds) all the data, including data which has been
> already moved into elasticsearch in the previous iteration is moving to it.
> So we are having duplicate values in the elastic search. How to avoid this
> problem.
>
> We are using this plugin to manage logs which are generated during any
> online transaction. So we will be having millions of transactions.
> Following is the table schema.
>
> CREATE TABLE logs (
>   txn_id text,
>   logged_at timestamp,
>   des text,
>   key_name text,
>   params text,
>   PRIMARY KEY (txn_id, logged_at)
> )
>
> The txn_id is a 16 digit number and is not unique. It is a combination of
> 6 random numbers generated using a random function, followed by the epoch
> timestamp in millisec(10 digits).
>
> I want to move only the data which has been generated after the previous
> transaction and not the data which was already moved in the previous
> transaction.
> I tried to do it with static values, counter variables, comparing the
> write_time of each row and order by. Still its not working . Please suggest
> me any ideas.
>
>
> Thanks and regards
> vinod joseph
> 8050136948
>
>
>

Re: Issues in moving data from cassandra to elasticsearch in java.

Reply via email to