Hi Santosh, Spark is a distributed computation engine . You may ask why distributed ? The answer is when things are distributed, memory and cores can be increased to process parallely on scale . Since it is difficult to scale things vertically we scale horizontally.
Thanks And Regards Kushagra Deep From: Mich Talebzadeh <mich.talebza...@gmail.com> Date: Monday, 12 October 2020 at 11:23 PM To: Santosh74 <sardesaisant...@gmail.com> Cc: "user @spark" <user@spark.apache.org> Subject: Re: Spark as computing engine vs spark cluster Hi Santosh, Generally speaking, there are two ways of making a process faster: 1. Do more intelligent work by creating indexes, cubes etc thus reducing the processing time 2. Throw hardware and memory at it using something like Spark multi-cluster with fully managed cloud service like Google Dataproc So the framework is a computational engine (Spark) and the physical realisation is achieved by creating a Spark cluster (multi nodes/VM hosts) that work in tandem and provide parallel processing. I suggest that you look at Spark docs <https://spark.apache.org/> HTH [Image removed by sender.] LinkedIn https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On Sat, 10 Oct 2020 at 15:24, Santosh74 <sardesaisant...@gmail.com<mailto:sardesaisant...@gmail.com>> wrote: Is spark compute engine only or it's also cluster which comes with set of hardware /nodes ? What exactly is spark clusterr? -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org>