Re: Website Down

2025-02-05 Thread walt
On 2025-02-06 05:12, Will Dumas wrote: Good afternoon, I am a student and need Spark for a lab, but the website appears bugged. Do you think this will be fixed today? Will https://spark.apache.org/ works not right for me too.

Re: about cpu cores

2022-07-11 Thread Yong Walt
(can be set to >1 >> too, but that's a different story). If there are more tasks ready to >> execute than available cores, some tasks simply wait. >> >> On Sun, Jul 10, 2022 at 3:31 AM Yong Walt wrote: >> >>> given my spark cluster has 128 cores totall

about cpu cores

2022-07-10 Thread Yong Walt
given my spark cluster has 128 cores totally. If the jobs (each job was assigned only one core) I submitted to the cluster are over 128, what will happen? Thank you.

Re: Will it lead to OOM error?

2022-06-22 Thread Yong Walt
We have many cases like this. it won't cause OOM. Thanks On Wed, Jun 22, 2022 at 8:28 PM Sid wrote: > I have a 150TB CSV file. > > I have a total of 100 TB RAM and 100TB disk. So If I do something like this > > spark.read.option("header","true").csv(filepath).show(false) > > Will it lead to an

Re: Spark Doubts

2022-06-21 Thread Yong Walt
These are the basic concepts in spark :) You may take a bit time to read this small book: https://cloudcache.net/resume/PDDWS2-V2.pdf regards On Wed, Jun 22, 2022 at 3:17 AM Sid wrote: > Hi Team, > > I have a few doubts about the below questions: > > 1) data frame will reside where? memory? di

Re: input file size

2022-06-18 Thread Yong Walt
import java.io.Fileval someFile = new File("somefile.txt")val fileSize = someFile.length This one? On Sun, Jun 19, 2022 at 4:33 AM mbreuer wrote: > Hello Community, > > I am working on optimizations for file sizes and number of files. In the > data frame there is a function input_file_name wh

Re: spark-submit working differently than pyspark when trying to find external jars

2015-06-09 Thread Walt Schlender
I figured it out *in case anyone else has this problem in the future. spark-submit --driver-class-path lib/postgresql-9.4-1201.jdbc4.jar --packages com.databricks:spark-csv_2.10:1.0.3 path/to/my/script.py What I found is that you MUST put the path to your script at the end of the spark-submit c