Hello everyone, I've been struggling for a while to perform a session-window join using pyflink / SQL API. From my understanding it seems that the table API requires an equality on the window_start and window_end parameters. Unfortunately, in the case of session join (with static gap) it is rarely the case that the window of both left and right streams will produce same start and end, leaving the window inner-join empty. In order to get around this issue i've tried to cast both my left and right tables to a common schema, union them and make a session window on the unioned stream, which seems to work for 2 or 3 tables but I'm hitting severe performance issues when trying to scale this solution (my use-case has 15 tables with 10k records per second each on which I would like to perform a Session Window).
On the other hand, the looking at the description of the Datastream API of Session-Window Join<https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/dev/datastream/operators/joining/#session-window-join> it seems to do exactly what I'm looking for: When performing a session window join, all elements with the same key that when “combined” fulfill the session criteria are joined in pairwise combinations and passed on to the JoinFunction or FlatJoinFunction. Again this performs an inner join, so if there is a session window that only contains elements from one stream, no output will be emitted! Unfortunately, it seems that WindowJoin is still not supported in the python API. After looking around in the code of the pyflink implementation, it seems to me that I could manage to propose a base implementation if pointed in the right direction by someone with experience on maintaining the pyflink repository. Would the community be interested in such support for Joins in python API ? If so I would be willing to get started with an implementation proposal and open a PR if the community. Thanks a lot ! Best Regrards, Hugo Polsinelli