Loop invariant data should be kept in Flink's managed memory in serialized form (in a custom hash table). That means that they are not read back again from the CSV file, but they are kept in serialized form and need be deserialized again on access.
CC'ing Fabian to double check... On Mon, Apr 24, 2017 at 4:20 PM, Robert Schwarzenberg <schwarzenb...@campus.tu-berlin.de> wrote: > Hello, > > I have a question regarding the loop-awareness of Flink wrt invariant > datasets. > > Does Flink serialize the DataSet 'points' in line 85 > > https://github.com/apache/flink/blob/master/flink-examples/flink-examples-batch/src/main/scala/org/apache/flink/examples/scala/clustering/KMeans.scala > > each iteration or are there in-memory optimization procedures in place? > > Thanks for your help! > > Regards, > Robert