Hello Spark experts - I’m running Spark jobs in cluster mode using a
dedicated cluster for each job. Is there a way to see how much compute time
each job takes via Spark APIs, metrics, etc.? In case it makes a
difference, I’m using AWS EMR - I’d ultimately like to be able to say this
job costs $X s
.repartition(6)
> .cache()
> )
>
> On Fri, Sep 8, 2023 at 14:56 Jack Wells wrote:
>
>> Hi Nebi, can you share the code you’re using to read and write from S3?
>>
>> On Sep 8, 2023 at 10:59:59, Nebi Aydin
>> wrote:
>>
>>> Hi all,
>&
Hi Nebi, can you share the code you’re using to read and write from S3?
On Sep 8, 2023 at 10:59:59, Nebi Aydin
wrote:
> Hi all,
> I am using spark on EMR to process data. Basically i read data from AWS S3
> and do the transformation and post transformation i am loading/writing data
> to s3.
>
>
Hi Ruben,
I’m not sure if this answers your question, but if you’re interested in
exploring the underlying tables, you could always try something like the
below in a Databricks notebook:
display(spark.read.table(’samples.nyctaxi.trips’))
(For vanilla Spark users, it would be
spark.read.table(’s