Re: ordered ingestion not guaranteed

2018-05-11 Thread ravidspark
Jorn, Thanks for the response. My downstream database is Kudu. 1. Yes. As you have suggested, I have been using a central caching mechanism that caches the rdd results and to make a comparison with the next batch to check for the latest timestamps and ignore the old timestamps. But, I see handlin

Re: ordered ingestion not guaranteed

2018-05-11 Thread Jörn Franke
What DB do you have? You have some options, such as 1) use a key value store (they can be accessed very efficiently) to see if there has been a newer key already processed - if yes then ignore value if no then insert into database 2) redesign the key to include the timestamp and find out the la