Hi Jacky Lau,

1) yes I saw that

2) I saw sources like IntegerSource which are bounded and which extend RichParallelSourceFunction. This is why I mentioned it.

3) True, there is an hadoop ES connector in Beam but it is more of a side connector, the main one is ElasticsearchIO here: https://github.com/apache/beam/blob/095589c28f5c427bf99fc0330af91c859bb2ad6b/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java#L156 and it does not use hadoop.

4) Yes but using the same client could simplify the code in the end, but I agree it needs more change in the current code.

Etienne

On 05/06/2020 05:50, Jacky Lau wrote:
hi Etienne Chauchot:
thanks for your discussion.
for 1) we do not supprt es  unbouded source currently

for 2) RichParallelSourceFunction is used for streaming ,InputFormat is for
batch

for 3)  i downloaded beam just now. and the beam es connector is also using
es-hadoop. i have read the code of es-hadoop(inputsplit contains shard and
slice. And i think it is better when diffirent shard has diffirent number of
docs), which you can seed here
.https://github.com/elastic/elasticsearch-hadoop. But the code is not good.
so we do not want to reference . and you can see presto, there is also just
using inputsplit with shard not contains slice

for 4) because flink es connectro has alreay using diffrent client (es 5 for
tranport client, es 6,7 for highlevelrest), we just  reuse it,which will not
change too much code



--
Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/

Reply via email to