from:"Jack Wells"

Cluster-mode job compute-time/cost metrics

2023-12-11 Thread Jack Wells

Hello Spark experts - I’m running Spark jobs in cluster mode using a dedicated cluster for each job. Is there a way to see how much compute time each job takes via Spark APIs, metrics, etc.? In case it makes a difference, I’m using AWS EMR - I’d ultimately like to be able to say this job costs $X s

Re: [External Email] Re: About /mnt/hdfs/current/BP directories

2023-09-08 Thread Jack Wells

.repartition(6) > .cache() > ) > > On Fri, Sep 8, 2023 at 14:56 Jack Wells wrote: > >> Hi Nebi, can you share the code you’re using to read and write from S3? >> >> On Sep 8, 2023 at 10:59:59, Nebi Aydin >> wrote: >> >>> Hi all, >&

Re: About /mnt/hdfs/current/BP directories

2023-09-08 Thread Jack Wells

Hi Nebi, can you share the code you’re using to read and write from S3? On Sep 8, 2023 at 10:59:59, Nebi Aydin wrote: > Hi all, > I am using spark on EMR to process data. Basically i read data from AWS S3 > and do the transformation and post transformation i am loading/writing data > to s3. > >

Re: [Spark SQL] Data objects from query history

2023-07-03 Thread Jack Wells

Hi Ruben, I’m not sure if this answers your question, but if you’re interested in exploring the underlying tables, you could always try something like the below in a Databricks notebook: display(spark.read.table(’samples.nyctaxi.trips’)) (For vanilla Spark users, it would be spark.read.table(’s

Cluster-mode job compute-time/cost metrics

Re: [External Email] Re: About /mnt/hdfs/current/BP directories

Re: About /mnt/hdfs/current/BP directories

Re: [Spark SQL] Data objects from query history

4 matches

Site Navigation

Mail list logo

Footer information