Hi, I have a question regarding the semantics of event processing from a source downstream that I want to clarify.
I have a custom source which offloads data from our data warehouse. In my custom source, I have some state which keeps track of the latest timestamps that were read. When unloading, I push the data from the warehouse to S3 and then scan the S3 bucket prefix in to know which files to read. These S3 file descriptors are only later read in an AbstractStreamOperator that is the direct child of the source, and there is a rehash boundary between the two. The ASO receives these S3 file descriptors one by one, reads them and does some additional parsing and sends them downstream to various operators. My question is: is there a guarantee, that once an element is being pushed downstream from the Source function to the ASO using SourceContext.collect, it will only return once the element has been processed throughout the entire execution graph? The reason I'm asking this is because I'm doing some refactoring work on the ASO and I'm wondering whether these file prefixes that are received via the processElement function in the ASO should be stored in a state somewhere up until I finish their processing. If SourceContext.collect does provide the guarantee that once pushed, it will only return when the element has passed through the entire DAG (even though some of the data needs to go over the wire) then I don't need any additional storage for that state and can rely on that fact that it has synchronously been pushed end to end. -- Best Regards, Yuval Itzchakov.