Hello Apache Beam Community,

I'm Marcin and I am currently working on a project using Apache Beam
2.57.0. I have encountered an issue when reading data from MongoDB
with the "mongodbio" connector. I am unable to reach the
transformation step due to an InvalidBSON error related to
out-of-range dates.

Error Message:

bson.errors.InvalidBSON: year 55054 is out of range (Consider Using
CodecOptions(datetime_conversion=DATETIME_AUTO) or
MongoClient(datetime_conversion='DATETIME_AUTO')). See:
https://pymongo.readthedocs.io/en/stable/examples/datetimes.html#handling-out-of-range-datetimes

Here are the details of my setup:

Apache Beam version: 2.57.0
Python version: 3.10

In my current MongoDB collection, it is possible to encounter dates
that are out of the standard range, such as year 0 or years greater
than 9999, which causes this issue.

I have handled this issue in standalone Python scripts using
CodecOptions and DatetimeConversion. However, I am facing difficulties
integrating this logic within an Apache Beam pipeline and I don't
think it's possible to handle without changing the source code of this
connector. I would appreciate any guidance or suggestions on how to
resolve this issue within the Beam framework.

Thank you for your assistance.

Best regards,
Marcin

Reply via email to