Yes, that is correct. This is a really basic implementation that doesn't take parallelism into account. I think you need something like this [1] to get that working.
[1]: https://docs.mongodb.com/manual/reference/command/parallelCollectionScan/#dbcmd.parallelCollectionScan Op ma 29 apr. 2019 om 14:37 schreef Flavio Pompermaier <pomperma...@okkam.it >: > But what about parallelism with this implementation? From what I see > there's only a single thread querying Mongo and fetching all the data..am I > wrong? > > On Mon, Apr 29, 2019 at 2:05 PM Wouter Zorgdrager < > w.d.zorgdra...@tudelft.nl> wrote: > >> For a framework I'm working on, we actually implemented a (basic) Mongo >> source [1]. It's written in Scala and uses Json4s [2] to parse the data >> into a case class. It uses a Mongo observer to iterate over a collection >> and emit it into a Flink context. >> >> Cheers, >> Wouter >> >> [1]: >> https://github.com/codefeedr/codefeedr/blob/develop/codefeedr-plugins/codefeedr-mongodb/src/main/scala/org/codefeedr/plugins/mongodb/BaseMongoSource.scala >> >> [2]: http://json4s.org/ >> >> Op ma 29 apr. 2019 om 13:57 schreef Flavio Pompermaier < >> pomperma...@okkam.it>: >> >>> I'm not aware of an official source/sink..if you want you could try to >>> exploit the Mongo HadoopInputFormat as in [1]. >>> The provided link use a pretty old version of Flink but it should not be >>> a big problem to update the maven dependencies and the code to a newer >>> version. >>> >>> Best, >>> Flavio >>> >>> [1] https://github.com/okkam-it/flink-mongodb-test >>> >>> On Mon, Apr 29, 2019 at 6:15 AM Hai <h...@magicsoho.com> wrote: >>> >>>> Hi, >>>> >>>> >>>> Can anyone give me a clue about how to read mongodb’s data as a >>>> batch/streaming datasource in Flink? I don’t find the mongodb connector in >>>> recent release version . >>>> >>>> >>>> Many thanks >>>> >>> >>> >