Can I make an Elasticsearch Sink effectively exactly once?

Stephen Connolly Thu, 21 Feb 2019 05:27:14 -0800

>From how I understand it:

https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/connectors/elasticsearch.html#elasticsearch-sinks-and-fault-tolerance


the Flink Elasticsearch Sink guarantees at-least-once delivery of action
> requests to Elasticsearch clusters. It does so by waiting for all pending
> action requests in the BulkProcessor at the time of checkpoints. This
> effectively assures that all requests before the checkpoint was triggered
> have been successfully acknowledged by Elasticsearch, before proceeding to
> process more records sent to the sink.


So I am thinking:


   - If I put a .map(json -> json.set("_id", ElasticsearchId.generate()) in
   front of the Elasticsearch sink
   - If I have a ActionRequestFailureHandler that drops any ID conflicts on
   the floor

Would this give me exactly once output to Elasticsearch as the
BulkProcessor's checkpoint would include the "_id" and thus in the event of
a recovery the duplicates would be detected.

Or is there some additional concern I need to be aware of?

Thanks

-stephenc

Can I make an Elasticsearch Sink effectively exactly once?

Reply via email to