Hi Salva and Yun,

Yun is correct on that the collector is not thread-safe so writing
should be guarded.

In addition, such a pattern that issues a request to a 3rd party
multi-threaded library and registers a callback for the future does
not play well with checkpointing. In your case, if a failure happens,
the data (or requests) that is "in-flight" are not part of any
checkpoint, thus you may have data loss. Your pattern seems more
suitable to the AsyncIO pattern [1] supported by Flink and it may make
sense to use that for you project.

I hope this helps,
Kostas

[1] 
https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/asyncio.html

On Wed, Feb 12, 2020 at 9:03 AM Yun Gao <yungao...@aliyun.com> wrote:
>
>       Hi Salva,
>
>             As far as I know,
>                1. Out : Collector[T] could not support multi-thread 
> accessing, namely there could be only one thread writing records at one time. 
> If there are multiple threads using `out`, the access should need to be 
> coordinated in some way (for example, use lock, or use a queue to cache 
> record and let a single thread output them with `out`.
>                2. With the current implementation, `Out` would not be 
> recreated. However, I think it is implementation related, if the processing 
> logic still happens in the `processElement` method, is it possible to always 
> use the `out` object passed into the method?
>
>
>            Best,
>            Yun
>
>
>
> ------------------------------------------------------------------
> From:Salva Alcántara <salcantara...@gmail.com>
> Send Time:2020 Feb. 12 (Wed.) 13:33
> To:user <user@flink.apache.org>
> Subject:Using multithreaded library within ProcessFunction with callbacks 
> relying on the out parameter
>
> I am working on a `CoProcessFunction` that uses a third party library for
> detecting certain patterns of events based on some rules. So, in the end,
> the `ProcessElement1` method is basically forwarding the events to this
> library and registering a callback so that, when a match is detected, the
> CoProcessFunction can emit an output event. For achieving this, the callback
> relies on a reference to the `out: Collector[T]` parameter in
> `ProcessElement1`.
>
> Having said that, I am not sure whether this use case is well-supported by
> Flink, since:
>
> 1. There might be multiple threads spanned by the third party library (let's
> I have not any control over the amount of threads spanned, this is decided
> by the library)
> 2. I am not sure whether `out` might be recreated or something by Flink at
> some point, invalidating the references in the callbacks, making them crash
>
> So far I have not observed any issues, but I have just run my program in the
> small. It would be great to hear from the experts whether my approach is
> valid or not.
>
> PS: Also posted in
> https://stackoverflow.com/questions/60181678/using-multithreaded-library-within-processfunction-with-callbacks-relying-on-the
>
>
>
> --
> Sent from: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>
>

Reply via email to