Re: [Discussion] flink elasticsearch connector supports

Etienne Chauchot Thu, 04 Jun 2020 07:08:22 -0700

Hi,

I made the Elasticsearch connector of Apache Beam and I was thinkingabout doing the same for Flink when I came by this discussion. I havesome comments regarding the design doc:


1. Streaming source:

ES has data streams features but only for time series data; the aim ofthis source is to read all kind of data. Apart from data streams, ESbehaves like a database: you read the content of an index (similar to atable) corresponding to the given query (similar to SQL). So, regardingstreaming changes, if there are changes between 2 read requests made bythe source, at the second the whole index (containing the change) willbe read another time. So, I see no way of having a regular flow ofdocuments updates (insertion, deletion, update) as we would need for astreaming source. Regarding failover: I guess exactly once semanticscannot be guaranteed, only at least once semantics can. Indeed there isno ack mechanism on already read data. As a conclusion, IMO you areright to target only batch source. Also this answers Yangze Guo'squestion about streaming source. Question is: can a batch only source beaccepted as a built in flink source ?


2. hadoop ecosystem

Why not use RichParallelSourceFunction ?

3. Splitting

Splitting with one split = one ES shard could lead to sub-parallelism.IMHO I think that what's important is the number of executors there arein the Flink cluster: it is better to useruntimeContext.getIndexOfThisSubtask() andruntimeContext.getMaxNumberOfParallelSubtasks() to split the input datausing ES slice API.


4. Targeting ES 5, 6, 7

In Beam I used low level REST client because it is compatible with allES versions so it allows to have the same code base for all versions.But this client is very low level (String based requests). Now, highlevel rest client exists (it was not available at the time), it is theone I would use. It is also available for ES 5 so you should use it forES 5 instead of deprecated Transport client.


Best

Etienne Chauchot.


On 04/06/2020 08:47, Yangze Guo wrote:

Hi, Jackey.

Thanks for driving this discussion. I think this proposal should be a
FLIP[1] since it impacts the public interface. However, as we have
only some preliminary discussions atm, a design draft would be ok. But
it would be better to organize your document according to [2].

I've two basic questions:
- Could your summarize all the public API and configurations (DDL) of
the ElasticSearchTableSource?
- If we want to implement ElasticSearch DataStream Source at the same
time, do we need to do a lot of extra work apart from this?

It also would be good if you could do some tests (performance and
correctness) to address Robert's comments.

[1] 
https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
[2] https://cwiki.apache.org/confluence/display/FLINK/FLIP+Template

Best,
Yangze Guo

On Wed, Jun 3, 2020 at 9:41 AM Jacky Lau <liuyon...@gmail.com> wrote:

Hi Robert Metzger:
     Thanks for your response. could you please read this docs.
https://www.yuque.com/jackylau-sc7w6/bve18l/14a2ad5b7f86998433de83dd0f8ec067
. Any Is it any problem here? we are worried about
we do not think  throughly. thanks.



--
Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/

Re: [Discussion] flink elasticsearch connector supports

Reply via email to