Have you raised it in ES connector github as issues? In my past experience
(with hadoop connector with Pig), they respond pretty quickly.

On Tue, Oct 10, 2017 at 12:36 AM, sixers <buskiew...@gmail.com> wrote:

> ### Issue description
>
> We have an issue with data consistency when storing data in Elasticsearch
> using Spark and elasticsearch-spark connector. Job finishes successfully,
> but when we compare the original data (stored in S3), with the data stored
> in ES, some documents are not present in Elasticsearch.
>
> ### Steps to reproduce
>
> This issue doesn't always happen and unfortunately we cannot reproduce it
> on
> demand. The only indicator we found that correlates with occurrences of
> this
> bug, is the presence of the failed stage while saving data in
> Elasticsearch.
> Jobs which have this stage failure eventually complete successfully, but
> the
> data is inconsistent.
>
> We use the following configuration:
>
> - Elasticsearch:
>   - "es.write.operation": "index"
>   - "es.nodes.discovery": "false"
>   - "es.nodes.wan.only": "true"
> - Spark:
>   - write mode: "append"
>
> ### Version Info
>
> - OS:         :  Amazon Linux
> - JVM         :  1.8
> - Hadoop/Spark:  Hadoop 2.7.3 (Amazon), Spark 2.2.0
> - ES-Hadoop   :  elasticsearch-spark-20_2.11:5.5.2
> - ES          :  5.3 (Amazon Elasticsearch Service).
>
> ### Questions
>
> I'm looking for some guidance in order to debug this issue.
>
> 1. I want to understand why Elasticsearch doesn't have all the data
> although
> Spark says it finished the job and saved the data?
> 2. What can we do to ensure that we write data to ES in a consistent
> manner?
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


-- 
Best Regards,
Ayan Guha

Reply via email to