Hi Beamers (is that a thing?),

I am relatively new to Beam and am attempting to use the python WriteToMongoDB 
transform but ran into some undesirable behavior. The implementation seems to 
wait until the entire PCollection has been received to start doing the actual 
Mongo writes. My use case requires millions of new document writes as part of a 
larger pipeline so this results in a massive backup at best, and possible 
memory issues that cause the whole job to fail.

I would like to switch to batched incremental writes but, as far as I can tell, 
this is not possible in the current WriteToMongoDB implementation. This seems 
primarily due to the “reshuffle” step that requires the entire set of elements 
prior to executing. I attempted to use different window and trigger 
configurations, but it seems to ignore those and just use a global window 
regardless.

Am I missing something here? Is there some other way around this constraint? 
I’m nearing the point where I am just going to implement my own Mongo writer 
but wanted to check in here first to see if anyone can provide alternate 
guidance.

Thanks!
Jon

Reply via email to