hi Jacky Lau :

I agree with jark's point of view, the use of es is not just to read data,
more use is to group query, aggregate these.

Best,
Forward

Jacky Lau <liuyon...@gmail.com> 于2020年6月5日周五 下午2:47写道:

> hi Etienne Chauchot:
> you can read here https://www.jianshu.com/p/d32e17dab90c, which is
> chinese.But you can konw that slice api has poor performance in es-hadoop
> project .
>
> And i found that es-hadoop has removed this and disable sliced scrolls by
> default. you can see below, which i found in the lastest es-hadoop release
> version
> ==== Configuration Changes
> `es.input.use.sliced.partitions` is deprecated in 6.5.0, and will be
> removed
> in 7.0.0. The default value for `es.input.max.docs.per.partition` (100000)
> will also be removed in 7.0.0, thus disabling sliced scrolls by default,
> and
> switching them to be an explicitly opt-in feature.
>
> added[5.0.0]
> `es.input.max.docs.per.partition` ::
> When reading from an {es} cluster that supports scroll slicing ({es} v5.0.0
> and above), this parameter advises the
> connector on what the maximum number of documents per input partition
> should
> be. The connector will sample and estimate
> the number of documents on each shard to be read and divides each shard
> into
> input slices using the value supplied by
> this property. This property is a suggestion, not a guarantee. The final
> number of documents per partition is not
> guaranteed to be below this number, but rather, they will be close to this
> number. This property is ignored if you are
> reading from an {es} cluster that does not support scroll slicing ({es} any
> version below v5.0.0). By default, this
> value is unset, and the input partitions are calculated based on the number
> of shards in the indices being read.
>
>
>
> Jacky Lau wrote
> > hi Etienne Chauchot:
> > thanks for your discussion.
> > for 1) we do not supprt es  unbouded source currently
> >
> > for 2) RichParallelSourceFunction is used for streaming ,InputFormat is
> > for
> > batch
> >
> > for 3)  i downloaded beam just now. and the beam es connector is also
> > using
> > es-hadoop. i have read the code of es-hadoop(inputsplit contains shard
> and
> > slice. And i think it is better when diffirent shard has diffirent number
> > of
> > docs), which you can seed here
> > .https://github.com/elastic/elasticsearch-hadoop. But the code is not
> > good.
> > so we do not want to reference . and you can see presto, there is also
> > just
> > using inputsplit with shard not contains slice
> >
> > for 4) because flink es connectro has alreay using diffrent client (es 5
> > for
> > tranport client, es 6,7 for highlevelrest), we just  reuse it,which will
> > not
> > change too much code
> >
> >
> >
> > --
> > Sent from:
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/
>
>
>
>
>
> --
> Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/
>

Reply via email to