That might help with cardinality estimation for cost-based optimization. For example when deciding about join strategies (broadcast vs. repartition, build-side of a hash join). However, as Stephan said, there are many cases where it does not make a difference, e.g. if the input cardinality of the filter (or the size of the other join input) is unknown.
I think, chances are low that it makes a difference. 2015-05-04 14:53 GMT+02:00 Flavio Pompermaier <pomperma...@okkam.it>: > Thanks Sebastian and Fabian for the feedback, just one last question: > what does change from the system point of view to know that the output > tuples is <= the number of input tuples? > Is there any optimization that Flink can apply to the pipeline? > > On Mon, May 4, 2015 at 2:49 PM, Fabian Hueske <fhue...@gmail.com> wrote: > >> It should not make a difference. I think its just personal taste. >> >> If your filter condition is simple enough, I'd go with Flink's Table API >> because it does not require to define a Filter or FlatMapFunction. >> >> >> 2015-05-04 14:43 GMT+02:00 Flavio Pompermaier <pomperma...@okkam.it>: >> >>> Hi Flinkers, >>> >>> I'd like to know whether it's better to perform a filter.project or a >>> flatMap to filter tuples and do some projection after the filter. >>> Functionally they are equivalent but maybe I'm ignoring something.. >>> >>> Thanks in advance, >>> Flavio >>> >> >> > >