Hi all, I was hoping for some advice on creating a MicroBatchStream for MongoDB.
MongoDB has a tailable cursor that listens to changes in a collection (known as a change stream). As a user watches a collection via the change stream cursor, the cursor reports a resume token that determines the last seen operation. I would like to use this resume token as my offset. What I'm struggling with is how to signal this resume token offset from the PartitionReader back to MicroBatchStream. Even if there is no result back from the cursor a new last seen resume token is available. When planning partitions I only know the start offset, the resume token represents the last offset. Because I'm using a single change stream cursor, I am only producing a single partition each time. However, I would like to be able to continue the next partition from the last seen resume token as seen by the PartitionReader. Is this approach possible? If so how would I pass that information back to the Spark driver / MicroBatchStream? Ross
