Hello Flinkers, I am experimenting a bit with DataSet API and I have written a simple program that joins two (key, value) datasets by key. The server I am running my experiments has 12 cores with 4 threads each, thus I have set the number of slots for a TaskManager to 12x4=48 to leverage the full parallelism. Although, I am trying to run the same join with different levels of parallelism.
I do the join and count() the result. The running time of the experiment executed with parallelism 48 is EQUAL (?!?!?) with the running time of the experiment with parallelism 1 or 10 or 20. How is this possible? It does not make sense. I expected to see at least some difference. If you have any ideas, please share! Best, Max P.S. Also, is there any DummySink for DataSet API like in DataStream API as I only care for enumerating the result for now. The count() does not let me do env.execute() and I would like to get the getNetRuntime() from env after -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/