Hi David,
please find my answers below:
1. For high utilization, all slot should be filled. Each slot will
processes a slice of the program on a slice of the data. In case of
partitioning or changed parallelism, the data is shuffled accordingly .
2. That's a good question. I think the default log
Hi Fabian,
Thank you for the great, detailed answers.
1. So, each parallel slice of the DAG is placed into one slot. The key to
high utilization is many slices of the source data (or the various methods
of repartitioning it). Yes?
2. In batch processing, are slots filled round-robin on task manag
Hi David,
Flink's DataSet API schedules one slice of a program to a task slot. A
program slice is one parallel instance of each operator of a program.
When all operator of your program run with a parallelism of 1, you end up
with only 1 slice that runs on a single slot.
Flink's DataSet API leverag
Hello -
I have a large number of pairs of files. For purpose of discussion:
/source1/{1..1} and /source2/{1..1}.
I want to join the files pair-wise: /source1/1 joined to /source2/1,
/source1/2 joined to /source2/2, and so on.
I then want to union the results of the pair-wise joins and per