Please avoid collecting the data to the client using collect(). This operation looks convenient but is only meant for super small data and would be a lot slower and less robust even if it would work for large data sets. Rather set the parallelism of the operator to 1.
Fabian 2017-01-05 13:18 GMT+01:00 Sebastian Neef <gehax...@mailbox.tu-berlin.de>: > Hi Chesnay, > > thanks for the input. Finding a word's first occurrence is part of the > algorithm. > > To be exact I'm trying to implement Adler's Text authorship tracking in > flink (http://www2007.org/papers/paper692.pdf, page 266). > > Thanks, > Sebastian >