Hi, as far as I understand you shouldn't send data to driver. Suppose you have file in hdfs/s3 or cassandra partitioning, you should create your job such that every executor/worker of spark will handle part of your input, transform, filter it and at the end write back to cassandra as output(once again every executor/core inside worker will write part of the output, in your case they will write part of report)
In general I find that submitting multiple jobs in same spark context(aka driver) is more performant(you don't pay startup-shutdown time), for this some use rest server for submitting jobs to long running spark context(driver) I'm not sure you can run multiple concurrent drivers because of ports On 4 June 2015 at 17:30, Giuseppe Sarno <[email protected]> wrote: > Hello, > > I am relatively new to spark and I am currently trying to understand how > to scale large numbers of jobs with spark. > > I understand that spark architecture is split in “Driver”, “Master” and > “Workers”. Master has a standby node in case of failure and workers can > scale out. > > All the examples I have seen show Spark been able to distribute the load > to the workers and returning small amount of data to the Driver. In my case > I would like to explore the scenario where I need to generate a large > report on data stored on Cassandra and understand how Spark architecture > will handle this case when multiple report jobs will be running in parallel. > > According to this presentation > https://trongkhoanguyenblog.wordpress.com/2015/01/07/understand-the-spark-deployment-modes/ > responses from workers go through the Master and finally to the Driver. > Does this mean that the Driver and/ or Master is a single point for all the > responses coming back from workers ? > > Is it possible to start multiple concurrent Drivers ? > > > > Regards, > > Giuseppe. > > > > Fair Isaac Services Limited (Co. No. 01998476) and Fair Isaac (Adeptra) > Limited (Co. No. 03295455) are registered in England and Wales and have a > registered office address of Cottons Centre, 5th Floor, Hays Lane, London, > SE1 2QP. > > This email and any files transmitted with it are confidential, proprietary > and intended solely for the individual or entity to whom they are > addressed. If you have received this email in error please delete it > immediately. >
