Hi Salva and Yun, Yun is correct on that the collector is not thread-safe so writing should be guarded.
In addition, such a pattern that issues a request to a 3rd party multi-threaded library and registers a callback for the future does not play well with checkpointing. In your case, if a failure happens, the data (or requests) that is "in-flight" are not part of any checkpoint, thus you may have data loss. Your pattern seems more suitable to the AsyncIO pattern [1] supported by Flink and it may make sense to use that for you project. I hope this helps, Kostas [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/asyncio.html On Wed, Feb 12, 2020 at 9:03 AM Yun Gao <yungao...@aliyun.com> wrote: > > Hi Salva, > > As far as I know, > 1. Out : Collector[T] could not support multi-thread > accessing, namely there could be only one thread writing records at one time. > If there are multiple threads using `out`, the access should need to be > coordinated in some way (for example, use lock, or use a queue to cache > record and let a single thread output them with `out`. > 2. With the current implementation, `Out` would not be > recreated. However, I think it is implementation related, if the processing > logic still happens in the `processElement` method, is it possible to always > use the `out` object passed into the method? > > > Best, > Yun > > > > ------------------------------------------------------------------ > From:Salva Alcántara <salcantara...@gmail.com> > Send Time:2020 Feb. 12 (Wed.) 13:33 > To:user <user@flink.apache.org> > Subject:Using multithreaded library within ProcessFunction with callbacks > relying on the out parameter > > I am working on a `CoProcessFunction` that uses a third party library for > detecting certain patterns of events based on some rules. So, in the end, > the `ProcessElement1` method is basically forwarding the events to this > library and registering a callback so that, when a match is detected, the > CoProcessFunction can emit an output event. For achieving this, the callback > relies on a reference to the `out: Collector[T]` parameter in > `ProcessElement1`. > > Having said that, I am not sure whether this use case is well-supported by > Flink, since: > > 1. There might be multiple threads spanned by the third party library (let's > I have not any control over the amount of threads spanned, this is decided > by the library) > 2. I am not sure whether `out` might be recreated or something by Flink at > some point, invalidating the references in the callbacks, making them crash > > So far I have not observed any issues, but I have just run my program in the > small. It would be great to hear from the experts whether my approach is > valid or not. > > PS: Also posted in > https://stackoverflow.com/questions/60181678/using-multithreaded-library-within-processfunction-with-callbacks-relying-on-the > > > > -- > Sent from: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ > >