AWS has 2 offerings built on top of Spark: EMR and Glue. You can, of course, 
spin up your EC2 instances and deploy Spark on it. The 3 offerings allows you 
to tradeoff between flexibility and  infrastructure management. EC2 gives you 
the most flexibility, because it's basically a bunch of nodes, and you can 
configure spark anyway you want. Con is that you need to manage your EC2 
instances. EMR is a step up: You manage your EC2 instances, but you don't need 
to manage Spark. With Glue, you don't need to manage infrastructure.  Glue is 
serverless (for you)

Besides, those, you also get different choices. Like, if your usage is spiky, 
you could implement this in Kinesis. Or you could have your reporting 
application make queries to Athena

On 2/24/21, 10:25 AM, "Stephane Verlet" <for...@verlet.name> wrote:

    CAUTION: This email originated from outside of the organization. Do not 
click links or open attachments unless you can confirm the sender and know the 
content is safe.



    Hello,

    We have been using Spark on a on-premise cluster for several years and
    looking at moving to a cloud deployment.

    I was wondering what is your current favorite cloud setup.  Just simple
    AWR / Azure, or something on top like Databricks ?

    This would support a on demand report application so usage would be
    sporadic with spikes during the day. Current deployment is Spark with
    Hive data.

    Thanks for sharing

    Stephane



    ---------------------------------------------------------------------
    To unsubscribe e-mail: user-unsubscr...@spark.apache.org


Reply via email to