Please avoid collecting the data to the client using collect(). This
operation looks convenient but is only meant for super small data and would
be a lot slower and less robust even if it would work for large data sets.
Rather set the parallelism of the operator to 1.

Fabian

2017-01-05 13:18 GMT+01:00 Sebastian Neef <gehax...@mailbox.tu-berlin.de>:

> Hi Chesnay,
>
> thanks for the input. Finding a word's first occurrence is part of the
> algorithm.
>
> To be exact I'm trying to implement Adler's Text authorship tracking in
> flink (http://www2007.org/papers/paper692.pdf, page 266).
>
> Thanks,
> Sebastian
>

Reply via email to