### Issue description We have an issue with data consistency when storing data in Elasticsearch using Spark and elasticsearch-spark connector. Job finishes successfully, but when we compare the original data (stored in S3), with the data stored in ES, some documents are not present in Elasticsearch.
### Steps to reproduce This issue doesn't always happen and unfortunately we cannot reproduce it on demand. The only indicator we found that correlates with occurrences of this bug, is the presence of the failed stage while saving data in Elasticsearch. Jobs which have this stage failure eventually complete successfully, but the data is inconsistent. We use the following configuration: - Elasticsearch: - "es.write.operation": "index" - "es.nodes.discovery": "false" - "es.nodes.wan.only": "true" - Spark: - write mode: "append" ### Version Info - OS: : Amazon Linux - JVM : 1.8 - Hadoop/Spark: Hadoop 2.7.3 (Amazon), Spark 2.2.0 - ES-Hadoop : elasticsearch-spark-20_2.11:5.5.2 - ES : 5.3 (Amazon Elasticsearch Service). ### Questions I'm looking for some guidance in order to debug this issue. 1. I want to understand why Elasticsearch doesn't have all the data although Spark says it finished the job and saved the data? 2. What can we do to ensure that we write data to ES in a consistent manner? -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org