Ok. This makes sense. I will try it out. One more question in terms of performance which of the two connector would scan the existing collection faster. Say existing collection has 10 million records and in terms of storage size it is 1GB.
Thanks Sachin On Fri, 16 Aug 2024 at 4:09 PM, Jiabao Sun <jiabao...@apache.org> wrote: > Yes, you can use flink-connector-mongodb-cdc to process both existing and > new data. > > See > https://nightlies.apache.org/flink/flink-cdc-docs-release-3.1/docs/connectors/flink-sources/mongodb-cdc/#startup-reading-position > > Best, > Jiabao > > On 2024/08/16 10:26:55 Sachin Mittal wrote: > > Hi Jiabao, > > My usecase is that when I start my flink job it should load and process > all > > the existing data in a collection and also wait and process any new data > > that comes along the way. > > As I notice that flink-connector-mongodb would process all the existing > > data, so do I still need this connector or I can use > > flink-connector-mongodb-cdc to process both existing and new data ? > > > > Thanks > > Sachin > > > > > > On Fri, Aug 16, 2024 at 3:46 PM Jiabao Sun <jiabao...@apache.org> wrote: > > > > > Hi Sachin, > > > > > > flink-connector-mongodb supports batch reading and writing to MongoDB, > > > similar to flink-connector-jdbc, while flink-connector-mongodb-cdc > supports > > > streaming MongoDB changes. > > > > > > If you need to stream MongoDB changes, you should use > > > flink-connector-mongodb-cdc. > > > You can refer to the following documentation about mongodb cdc. > > > > > > > > > > https://nightlies.apache.org/flink/flink-cdc-docs-release-3.1/docs/connectors/flink-sources/mongodb-cdc/ > > > > > > Best, > > > Jiabao > > > > > > On 2024/08/16 09:46:47 Sachin Mittal wrote: > > > > Hi, > > > > I have a scenario where I load a collection from MongoDB inside Flink > > > using > > > > flink-connector-mongodb. > > > > What I additionally want is any future changes (insert/updates) to > that > > > > collection is also streamed inside my Flink Job. > > > > > > > > What I was thinking of is to use a CDC connector to stream data to my > > > Flink > > > > job. > > > > > > > > When researching this I found Flink CDC and they have a CDC > connector for > > > > MongoDB - flink-connector-mongodb-cdc > > > > > > > > > > > > However I am not able to figure out how to stream those changes also > to > > > my > > > > Job which is also reading from the same collection. > > > > > > > > Thanks > > > > Sachin > > > > > > > > > >