Hi,
have you thought about making two independent jobs out of this? (or you
call execute() for the two separate parts)
One job for the update() and one for the insert() ?
Even though the update operation should not be expensive, I think its
helpful to understand the performance impact of having c
Hi Robert,
sorry, I should have been clearer in my initial mail. The two cases I was
comparing are:
1) distinct() before Insert (which is necessary as we have a unique key
constraint in our database), no distinct() before update
2) distinct() before insert AND distinct() before update
The test
Hi Max,
is the distinct() operation reducing the size of the DataSet? If so, I
assume you have an idempotent update and the job is faster because fewer
updates are done?
if the distinct() operator is not changing anything, then, the job might be
faster because the INSERT is done while Flink is sti