Re: Read mongo datasource in Flink

Wouter Zorgdrager Mon, 29 Apr 2019 05:59:06 -0700

Yes, that is correct. This is a really basic implementation that doesn't
take parallelism into account. I think you need something like this [1] to
get that working.


[1]:
https://docs.mongodb.com/manual/reference/command/parallelCollectionScan/#dbcmd.parallelCollectionScan

Op ma 29 apr. 2019 om 14:37 schreef Flavio Pompermaier <pomperma...@okkam.it
>:

> But what about parallelism with this implementation? From what I see
> there's only a single thread querying Mongo and fetching all the data..am I
> wrong?
>
> On Mon, Apr 29, 2019 at 2:05 PM Wouter Zorgdrager <
> w.d.zorgdra...@tudelft.nl> wrote:
>
>> For a framework I'm working on, we actually implemented a (basic) Mongo
>> source [1]. It's written in Scala and uses Json4s [2] to parse the data
>> into a case class. It uses a Mongo observer to iterate over a collection
>> and emit it into a Flink context.
>>
>> Cheers,
>> Wouter
>>
>> [1]:
>> https://github.com/codefeedr/codefeedr/blob/develop/codefeedr-plugins/codefeedr-mongodb/src/main/scala/org/codefeedr/plugins/mongodb/BaseMongoSource.scala
>>
>> [2]: http://json4s.org/
>>
>> Op ma 29 apr. 2019 om 13:57 schreef Flavio Pompermaier <
>> pomperma...@okkam.it>:
>>
>>> I'm not aware of an official source/sink..if you want you could try to
>>> exploit the Mongo HadoopInputFormat as in [1].
>>> The provided link use a pretty old version of Flink but it should not be
>>> a big problem to update the maven dependencies and the code to a newer
>>> version.
>>>
>>> Best,
>>> Flavio
>>>
>>> [1] https://github.com/okkam-it/flink-mongodb-test
>>>
>>> On Mon, Apr 29, 2019 at 6:15 AM Hai <h...@magicsoho.com> wrote:
>>>
>>>> Hi,
>>>>
>>>>
>>>> Can anyone give me a clue about how to read mongodb’s data as a
>>>> batch/streaming datasource in Flink? I don’t find the mongodb connector in
>>>> recent release version .
>>>>
>>>>
>>>> Many thanks
>>>>
>>>
>>>
>

Re: Read mongo datasource in Flink

Reply via email to