Hi! There is no such guarantee unless the whole DAG is a single node. Flink's runtime runs the same node (task) in the same thread, while different nodes (tasks) are executed in different threads, even in different machines.
Yuval Itzchakov <yuva...@gmail.com> 于2021年8月5日周四 上午2:10写道: > Hi, > > I have a question regarding the semantics of event processing from a > source downstream that I want to clarify. > > I have a custom source which offloads data from our data warehouse. In my > custom source, I have some state which keeps track of the latest timestamps > that were read. When unloading, I push the data from the warehouse to S3 > and then scan the S3 bucket prefix in to know which files to read. These S3 > file descriptors are only later read in an AbstractStreamOperator that is > the direct child of the source, and there is a rehash boundary between the > two. > > The ASO receives these S3 file descriptors one by one, reads them and does > some additional parsing and sends them downstream to various operators. > > My question is: is there a guarantee, that once an element is being pushed > downstream from the Source function to the ASO using SourceContext.collect, > it will only return once the element has been processed throughout the > entire execution graph? > > The reason I'm asking this is because I'm doing some refactoring work on > the ASO and I'm wondering whether these file prefixes that are received via > the processElement function in the ASO should be stored in a state > somewhere up until I finish their processing. If SourceContext.collect does > provide the guarantee that once pushed, it will only return when the > element has passed through the entire DAG (even though some of the data > needs to go over the wire) then I don't need any additional storage for > that state and can rely on that fact that it has synchronously been pushed > end to end. > > -- > Best Regards, > Yuval Itzchakov. >