That might help with cardinality estimation for cost-based optimization.
For example when deciding about join strategies (broadcast vs. repartition,
build-side of a hash join).
However, as Stephan said, there are many cases where it does not make a
difference, e.g. if the input cardinality of the f
If the system has to decide data shipping strategies for a join (e.g.,
broadcasting one side) it helps to have good estimates of the input sizes.
On 04.05.2015 14:53, Flavio Pompermaier wrote:
Thanks Sebastian and Fabian for the feedback, just one last question:
what does change from the system
Thanks Sebastian and Fabian for the feedback, just one last question:
what does change from the system point of view to know that the output
tuples is <= the number of input tuples?
Is there any optimization that Flink can apply to the pipeline?
On Mon, May 4, 2015 at 2:49 PM, Fabian Hueske wrot
Hah, interesting to see how opinions differ ;-)
Sebastian has a point, that filter + project is more transparent to the
system. In some situations, this knowledge can help the optimizer, but
often, it will not matter.
Greetings,
Stephan
On Mon, May 4, 2015 at 2:49 PM, Fabian Hueske wrote:
> I
It should not make a difference. I think its just personal taste.
If your filter condition is simple enough, I'd go with Flink's Table API
because it does not require to define a Filter or FlatMapFunction.
2015-05-04 14:43 GMT+02:00 Flavio Pompermaier :
> Hi Flinkers,
>
> I'd like to know wheth
filter + project is easier to understand for the system, as the number
of output tuples is guaranteed to be <= the number of input tuples. With
flatMap, the system cannot know an upper bound.
--sebastian
On 04.05.2015 14:43, Flavio Pompermaier wrote:
Hi Flinkers,
I'd like to know whether it'
Hi Flinkers,
I'd like to know whether it's better to perform a filter.project or a
flatMap to filter tuples and do some projection after the filter.
Functionally they are equivalent but maybe I'm ignoring something..
Thanks in advance,
Flavio