Hi, I am working on a java plugin which moves data from cassandra to elasticsearch. This plugin must run in the server for every 5 seconds. The data is getting moved, but the issue is that every time the plugin runs(ie after every 5 seconds) all the data, including data which has been already moved into elasticsearch in the previous iteration is moving to it. So we are having duplicate values in the elastic search. How to avoid this problem.
We are using this plugin to manage logs which are generated during any online transaction. So we will be having millions of transactions. Following is the table schema. CREATE TABLE logs ( txn_id text, logged_at timestamp, des text, key_name text, params text, PRIMARY KEY (txn_id, logged_at) ) The txn_id is a 16 digit number and is not unique. It is a combination of 6 random numbers generated using a random function, followed by the epoch timestamp in millisec(10 digits). I want to move only the data which has been generated after the previous transaction and not the data which was already moved in the previous transaction. I tried to do it with static values, counter variables, comparing the write_time of each row and order by. Still its not working . Please suggest me any ideas. Thanks and regards vinod joseph 8050136948