Testing DataSet API's join

m@xi Mon, 06 May 2019 01:44:00 -0700

Hello Flinkers,

I am experimenting a bit with DataSet API and I have written a simple
program that joins two (key, value) datasets by key. The server I am running
my experiments has 12 cores with 4 threads each, thus I have set the number
of slots for a TaskManager to 12x4=48 to leverage the full parallelism.
Although, I am trying to run the same join with different levels of
parallelism.


I do the join and count() the result. The running time of the experiment
executed with parallelism 48 is EQUAL (?!?!?) with the running time of the
experiment with parallelism 1 or 10 or 20. How is this possible?

It does not make sense. I expected to see at least some difference. If you
have any ideas, please share! 

Best,
Max

P.S. Also, is there any DummySink for DataSet API like in DataStream API as
I only care for enumerating the result for now. The count() does not let me
do env.execute() and I would like to get the getNetRuntime() from env after



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Testing DataSet API's join

Reply via email to