Hi Jacky Lau, 1) yes I saw that
2) I saw sources like IntegerSource which are bounded and which extend RichParallelSourceFunction. This is why I mentioned it.
3) True, there is an hadoop ES connector in Beam but it is more of a side connector, the main one is ElasticsearchIO here: https://github.com/apache/beam/blob/095589c28f5c427bf99fc0330af91c859bb2ad6b/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java#L156 and it does not use hadoop.
4) Yes but using the same client could simplify the code in the end, but I agree it needs more change in the current code.
Etienne On 05/06/2020 05:50, Jacky Lau wrote:
hi Etienne Chauchot: thanks for your discussion. for 1) we do not supprt es unbouded source currently for 2) RichParallelSourceFunction is used for streaming ,InputFormat is for batch for 3) i downloaded beam just now. and the beam es connector is also using es-hadoop. i have read the code of es-hadoop(inputsplit contains shard and slice. And i think it is better when diffirent shard has diffirent number of docs), which you can seed here .https://github.com/elastic/elasticsearch-hadoop. But the code is not good. so we do not want to reference . and you can see presto, there is also just using inputsplit with shard not contains slice for 4) because flink es connectro has alreay using diffrent client (es 5 for tranport client, es 6,7 for highlevelrest), we just reuse it,which will not change too much code -- Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/