It is possible to start multiple concurrent drivers, Spark dynamically
allocates ports per "spark application" on driver, master, and workers from
a port range. When you collect results back to the driver, they do not go
through the master. The master is mostly there as a coordinator between the
dr
Hi,
as far as I understand you shouldn't send data to driver. Suppose you have
file in hdfs/s3 or cassandra partitioning, you should create your job such
that every executor/worker of spark will handle part of your input,
transform, filter it and at the end write back to cassandra as output(once
ag
Hello,
I am relatively new to spark and I am currently trying to understand how to
scale large numbers of jobs with spark.
I understand that spark architecture is split in "Driver", "Master" and
"Workers". Master has a standby node in case of failure and workers can scale
out.
All the examples I