Re: Read mongo datasource in Flink

2019-04-29 Thread Kenny Gorman
Just a thought, A robust and high performance way to potentially achieve your goals is: Debezium->Kafka->Flink https://debezium.io/docs/connectors/mongodb/ Good robust handling of various topologies, reasonably good scaling properties, good resta

Re: Read mongo datasource in Flink

2019-04-29 Thread Wouter Zorgdrager
Yes, that is correct. This is a really basic implementation that doesn't take parallelism into account. I think you need something like this [1] to get that working. [1]: https://docs.mongodb.com/manual/reference/command/parallelCollectionScan/#dbcmd.parallelCollectionScan Op ma 29 apr. 2019 om 1

Re: Read mongo datasource in Flink

2019-04-29 Thread Flavio Pompermaier
But what about parallelism with this implementation? From what I see there's only a single thread querying Mongo and fetching all the data..am I wrong? On Mon, Apr 29, 2019 at 2:05 PM Wouter Zorgdrager wrote: > For a framework I'm working on, we actually implemented a (basic) Mongo > source [1].

Re: Read mongo datasource in Flink

2019-04-29 Thread Hai
Thanks for your sharing ~ That’s great ! Original Message Sender:Wouter zorgdragerw.d.zorgdra...@tudelft.nl Recipient:hai...@magicsoho.com Cc:useru...@flink.apache.org Date:Monday, Apr 29, 2019 20:05 Subject:Re: Read mongo datasource in Flink For a framework I'm working on, we act

Re: Read mongo datasource in Flink

2019-04-29 Thread Wouter Zorgdrager
For a framework I'm working on, we actually implemented a (basic) Mongo source [1]. It's written in Scala and uses Json4s [2] to parse the data into a case class. It uses a Mongo observer to iterate over a collection and emit it into a Flink context. Cheers, Wouter [1]: https://github.com/codefee

Re: Read mongo datasource in Flink

2019-04-29 Thread Hai
Hi, Flavio: That’s good, Thank you. I will try it later ~ Regards Original Message Sender:Flavio pompermaierpomperma...@okkam.it Recipient:hai...@magicsoho.com Cc:useru...@flink.apache.org Date:Monday, Apr 29, 2019 19:56 Subject:Re: Read mongo datasource in Flink I'm not aware

Re: Read mongo datasource in Flink

2019-04-29 Thread Flavio Pompermaier
I'm not aware of an official source/sink..if you want you could try to exploit the Mongo HadoopInputFormat as in [1]. The provided link use a pretty old version of Flink but it should not be a big problem to update the maven dependencies and the code to a newer version. Best, Flavio [1] https://g

Read mongo datasource in Flink

2019-04-28 Thread Hai
Hi, Can anyone give me a clue about how to read mongodb’s data as a batch/streaming datasource in Flink? I don’t find the mongodb connector in recent release version . Many thanks