hi Jacky Lau : I agree with jark's point of view, the use of es is not just to read data, more use is to group query, aggregate these.
Best, Forward Jacky Lau <liuyon...@gmail.com> 于2020年6月5日周五 下午2:47写道: > hi Etienne Chauchot: > you can read here https://www.jianshu.com/p/d32e17dab90c, which is > chinese.But you can konw that slice api has poor performance in es-hadoop > project . > > And i found that es-hadoop has removed this and disable sliced scrolls by > default. you can see below, which i found in the lastest es-hadoop release > version > ==== Configuration Changes > `es.input.use.sliced.partitions` is deprecated in 6.5.0, and will be > removed > in 7.0.0. The default value for `es.input.max.docs.per.partition` (100000) > will also be removed in 7.0.0, thus disabling sliced scrolls by default, > and > switching them to be an explicitly opt-in feature. > > added[5.0.0] > `es.input.max.docs.per.partition` :: > When reading from an {es} cluster that supports scroll slicing ({es} v5.0.0 > and above), this parameter advises the > connector on what the maximum number of documents per input partition > should > be. The connector will sample and estimate > the number of documents on each shard to be read and divides each shard > into > input slices using the value supplied by > this property. This property is a suggestion, not a guarantee. The final > number of documents per partition is not > guaranteed to be below this number, but rather, they will be close to this > number. This property is ignored if you are > reading from an {es} cluster that does not support scroll slicing ({es} any > version below v5.0.0). By default, this > value is unset, and the input partitions are calculated based on the number > of shards in the indices being read. > > > > Jacky Lau wrote > > hi Etienne Chauchot: > > thanks for your discussion. > > for 1) we do not supprt es unbouded source currently > > > > for 2) RichParallelSourceFunction is used for streaming ,InputFormat is > > for > > batch > > > > for 3) i downloaded beam just now. and the beam es connector is also > > using > > es-hadoop. i have read the code of es-hadoop(inputsplit contains shard > and > > slice. And i think it is better when diffirent shard has diffirent number > > of > > docs), which you can seed here > > .https://github.com/elastic/elasticsearch-hadoop. But the code is not > > good. > > so we do not want to reference . and you can see presto, there is also > > just > > using inputsplit with shard not contains slice > > > > for 4) because flink es connectro has alreay using diffrent client (es 5 > > for > > tranport client, es 6,7 for highlevelrest), we just reuse it,which will > > not > > change too much code > > > > > > > > -- > > Sent from: > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ > > > > > > -- > Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ >