*Thank so much Aljoscha* :) I was stucked in this point. I didn't know that the print or collect method collecting all the data in one place.
The execution time has dropped a lot. However, I still get that Flink is slower (just for 7 seconds). I really think I'm not getting all the performance out of Flink. Because Flink draws the execution in a cyclic dependency graph meanwhile Spark uses a DAG, so it's clear that the Flin's way results in superior scalability and performance compared to DAG approach. So... Which is the problem with my code? //Read data val data: DataSet[org.apache.flink.ml.common.LabeledVector] = MLUtils.readLibSVM(benv, "/inputPath/_.libsvm") // Create multiple linear regression learner val mlr = MultipleLinearRegression() val model = mlr.fit(data) data.writeAsText("file:///outputPath") benv.execute() -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Methods-that-trigger-execution-tp12972p13537.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.