from:"Jack Goodson"

Re: Tuning Best Practices

2023-11-28 Thread Jack Goodson

Hi Bryant, the below docs are a good start on performance tuning https://spark.apache.org/docs/latest/sql-performance-tuning.html Hope it helps! On Wed, Nov 29, 2023 at 9:32 AM Bryant Wright wrote: > Hi, I'm looking for a comprehensive list of Tuning Best Practices for > spark. > > I did a se

Re: Inquiry about Processing Speed

2023-09-28 Thread Jack Goodson

Hi Haseeb, I think the user mailing list is what you're looking for, people are usually pretty active on here if you present a direct question about apache spark. I've linked below the community guidelines which says which mailing lists are for what etc https://spark.apache.org/community.html Th

Re: Change default timestamp offset on data load

2023-09-07 Thread Jack Goodson

operty which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Thu, 7 Sept 2023 at 01:42, Jack Goodson wrote: >

Re: Change default timestamp offset on data load

2023-09-06 Thread Jack Goodson

din.com/in/mich-talebzadeh-ph-d-5205b2/> > > > https://en.everybodywiki.com/Mich_Talebzadeh > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on t

Change default timestamp offset on data load

2023-09-05 Thread Jack Goodson

Hi, I've got a number of tables that I'm loading in from a SQL server. The timestamp in SQL server is stored like 2003-11-24T09:02:32 I get these as parquet files in our raw storage location and pick them up in Databricks. When I load the data in databricks, the dataframe/spark assumes UTC or +000

Re: [Spark Structured Streaming] Could we apply new options of readStream/writeStream without stopping spark application (zero downtime)?

2023-02-15 Thread Jack Goodson

Hi, There is some good documentation under here https://docs.databricks.com/structured-streaming/query-recovery.html Under the “recovery after change in structured streaming query” heading that gives good general guidelines on what can be changed in a “pause” of a stream On Thu, 16 Feb 2023 at

Jira Account for Contributions

2023-02-09 Thread Jack Goodson

Hi, I'm wanting to start contributing to the Spark project, do I need a Jira account at https://issues.apache.org/jira/projects/SPARK/summary before I'm able to do this? If so can one please be created with this email address? Thank you

Re: Spark with GPU

2023-02-05 Thread Jack Goodson

As far as I understand you will need a GPU for each worker node or you will need to partition the GPU processing somehow to each node which I think would defeat the purpose. In Databricks for example when you select GPU workers there is a GPU allocated to each worker. I assume this is the “correct”

Re: Splittable or not?

2022-09-19 Thread Jack Goodson

When reading in Gzip files, I’ve always read them into a data frame and then written out to parquet/delta more or less in their raw form and then used these files for my transformations as the workloads are now parallelisable from these split files, when reading in Gzips these will be read by th

Re: Tuning Best Practices

Re: Inquiry about Processing Speed

Re: Change default timestamp offset on data load

Re: Change default timestamp offset on data load

Change default timestamp offset on data load

Re: [Spark Structured Streaming] Could we apply new options of readStream/writeStream without stopping spark application (zero downtime)?

Jira Account for Contributions

Re: Spark with GPU

Re: Splittable or not?

9 matches

Site Navigation

Mail list logo

Footer information