Hi,

            I am working on a java plugin which moves data from cassandra
to elasticsearch. This plugin must run in the server for every 5 seconds.
The data is getting moved, but the issue is that every time the plugin
runs(ie after every 5 seconds) all the data, including data which has been
already moved into elasticsearch in the previous iteration is moving to it.
So we are having duplicate values in the elastic search. How to avoid this
problem.

We are using this plugin to manage logs which are generated during any
online transaction. So we will be having millions of transactions.
Following is the table schema.

CREATE TABLE logs (
  txn_id text,
  logged_at timestamp,
  des text,
  key_name text,
  params text,
  PRIMARY KEY (txn_id, logged_at)
)

The txn_id is a 16 digit number and is not unique. It is a combination of 6
random numbers generated using a random function, followed by the epoch
timestamp in millisec(10 digits).

I want to move only the data which has been generated after the previous
transaction and not the data which was already moved in the previous
transaction.
I tried to do it with static values, counter variables, comparing the
write_time of each row and order by. Still its not working . Please suggest
me any ideas.


Thanks and regards
vinod joseph
8050136948

Reply via email to