AWS has 2 offerings built on top of Spark: EMR and Glue. You can, of course, spin up your EC2 instances and deploy Spark on it. The 3 offerings allows you to tradeoff between flexibility and infrastructure management. EC2 gives you the most flexibility, because it's basically a bunch of nodes, and you can configure spark anyway you want. Con is that you need to manage your EC2 instances. EMR is a step up: You manage your EC2 instances, but you don't need to manage Spark. With Glue, you don't need to manage infrastructure. Glue is serverless (for you)
Besides, those, you also get different choices. Like, if your usage is spiky, you could implement this in Kinesis. Or you could have your reporting application make queries to Athena On 2/24/21, 10:25 AM, "Stephane Verlet" <for...@verlet.name> wrote: CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. Hello, We have been using Spark on a on-premise cluster for several years and looking at moving to a cloud deployment. I was wondering what is your current favorite cloud setup. Just simple AWR / Azure, or something on top like Databricks ? This would support a on demand report application so usage would be sporadic with spikes during the day. Current deployment is Spark with Hive data. Thanks for sharing Stephane --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org