Hi Fabian,
thanks for your directions! They worked flawlessly. I am aware of the
reduced robustness, but then again my input is only available on each
worker and not replicated. In case anyone is wondering, here is how I did
it:
*https://github.com/robert-schmidtke/hdfs-statistics-adapter/tree/2a4
Hi Robert,
this is indeed a bit tricky to do. The problem is mostly with the
generation of the input splits, setup of Flink, and the scheduling of tasks.
1) you have to ensure that on each worker at least one DataSource task is
scheduled. The easiest way to do this is to have a bare metal setup (