Have you raised it in ES connector github as issues? In my past experience (with hadoop connector with Pig), they respond pretty quickly.
On Tue, Oct 10, 2017 at 12:36 AM, sixers <buskiew...@gmail.com> wrote: > ### Issue description > > We have an issue with data consistency when storing data in Elasticsearch > using Spark and elasticsearch-spark connector. Job finishes successfully, > but when we compare the original data (stored in S3), with the data stored > in ES, some documents are not present in Elasticsearch. > > ### Steps to reproduce > > This issue doesn't always happen and unfortunately we cannot reproduce it > on > demand. The only indicator we found that correlates with occurrences of > this > bug, is the presence of the failed stage while saving data in > Elasticsearch. > Jobs which have this stage failure eventually complete successfully, but > the > data is inconsistent. > > We use the following configuration: > > - Elasticsearch: > - "es.write.operation": "index" > - "es.nodes.discovery": "false" > - "es.nodes.wan.only": "true" > - Spark: > - write mode: "append" > > ### Version Info > > - OS: : Amazon Linux > - JVM : 1.8 > - Hadoop/Spark: Hadoop 2.7.3 (Amazon), Spark 2.2.0 > - ES-Hadoop : elasticsearch-spark-20_2.11:5.5.2 > - ES : 5.3 (Amazon Elasticsearch Service). > > ### Questions > > I'm looking for some guidance in order to debug this issue. > > 1. I want to understand why Elasticsearch doesn't have all the data > although > Spark says it finished the job and saved the data? > 2. What can we do to ensure that we write data to ES in a consistent > manner? > > > > -- > Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > -- Best Regards, Ayan Guha