Hi, Flavio I agree with you, but DataStream was originally designed for streaming scenarios, and it needs to take some time for the community to improve its capabilities for batch scenarios.
Best, Ron Flavio Pompermaier <pomperma...@okkam.it> 于2023年8月9日周三 16:37写道: > Hi Liu, > indeed my current experience migrating old Dataset code to new DataStream > is really frutrating. > It's very complicated to write a Source (unless you use the deprecated > SourceFunction or TableSource that is easier) and some operations are > really complicated because there should not be any windowing involved (like > in this case for outer joins or dataset broadcasting). I hope things will > improve for batch scenarios in the future. > > Best, > Flavio > > On Wed, Aug 9, 2023 at 4:55 AM liu ron <ron9....@gmail.com> wrote: > >> Hi, Flavio >> >> IMO, the current DataStream API is not aligned with DataSet in terms of >> capabilities, I think you can try it with GlobalWindow. Another possible >> solution is to convert the DataStream to a table[1] first and then try it >> with a join on the Table API. >> >> [1] >> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/tableapi/ >> >> Best, >> Ron >> >> Flavio Pompermaier <pomperma...@okkam.it> 于2023年8月8日周二 00:23写道: >> >>> Hello everybody, >>> I have a use case where I need to exclude from a DataStream (that is >>> technically a DataSet since I work in batch mode) all already-indexed >>> documents. >>> My idea is to perfrom an outer join but I didn't find any simple example >>> on DataStream working on batch mode..I've tried using coGroup() but then it >>> requires me to specify a windows strategy..in batch mode I would't expect >>> that..can I use global window? >>> >>> Thanks in advance, >>> Flavio >>> >>