Re: Easiest way to do a batch outer join

2023-08-09 Thread liu ron
Hi, Flavio I agree with you, but DataStream was originally designed for streaming scenarios, and it needs to take some time for the community to improve its capabilities for batch scenarios. Best, Ron Flavio Pompermaier 于2023年8月9日周三 16:37写道: > Hi Liu, > indeed my current experience migrating o

Re: Easiest way to do a batch outer join

2023-08-09 Thread Flavio Pompermaier
Hi Liu, indeed my current experience migrating old Dataset code to new DataStream is really frutrating. It's very complicated to write a Source (unless you use the deprecated SourceFunction or TableSource that is easier) and some operations are really complicated because there should not be any win

Re: Easiest way to do a batch outer join

2023-08-08 Thread liu ron
Hi, Flavio IMO, the current DataStream API is not aligned with DataSet in terms of capabilities, I think you can try it with GlobalWindow. Another possible solution is to convert the DataStream to a table[1] first and then try it with a join on the Table API. [1] https://nightlies.apache.org/flin

Easiest way to do a batch outer join

2023-08-07 Thread Flavio Pompermaier
Hello everybody, I have a use case where I need to exclude from a DataStream (that is technically a DataSet since I work in batch mode) all already-indexed documents. My idea is to perfrom an outer join but I didn't find any simple example on DataStream working on batch mode..I've tried using coGro