Re: [Spark SQL] Missing data in Elastisearch when writing data with elasticsearch-spark connector

ayan guha Mon, 09 Oct 2017 06:43:29 -0700

Have you raised it in ES connector github as issues? In my past experience
(with hadoop connector with Pig), they respond pretty quickly.


On Tue, Oct 10, 2017 at 12:36 AM, sixers <buskiew...@gmail.com> wrote:

> ### Issue description
>
> We have an issue with data consistency when storing data in Elasticsearch
> using Spark and elasticsearch-spark connector. Job finishes successfully,
> but when we compare the original data (stored in S3), with the data stored
> in ES, some documents are not present in Elasticsearch.
>
> ### Steps to reproduce
>
> This issue doesn't always happen and unfortunately we cannot reproduce it
> on
> demand. The only indicator we found that correlates with occurrences of
> this
> bug, is the presence of the failed stage while saving data in
> Elasticsearch.
> Jobs which have this stage failure eventually complete successfully, but
> the
> data is inconsistent.
>
> We use the following configuration:
>
> - Elasticsearch:
>   - "es.write.operation": "index"
>   - "es.nodes.discovery": "false"
>   - "es.nodes.wan.only": "true"
> - Spark:
>   - write mode: "append"
>
> ### Version Info
>
> - OS:         :  Amazon Linux
> - JVM         :  1.8
> - Hadoop/Spark:  Hadoop 2.7.3 (Amazon), Spark 2.2.0
> - ES-Hadoop   :  elasticsearch-spark-20_2.11:5.5.2
> - ES          :  5.3 (Amazon Elasticsearch Service).
>
> ### Questions
>
> I'm looking for some guidance in order to debug this issue.
>
> 1. I want to understand why Elasticsearch doesn't have all the data
> although
> Spark says it finished the job and saved the data?
> 2. What can we do to ensure that we write data to ES in a consistent
> manner?
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


-- 
Best Regards,
Ayan Guha

Re: [Spark SQL] Missing data in Elastisearch when writing data with elasticsearch-spark connector

Reply via email to