Re: What are your experiences using google cloud platform

2022-01-24 Thread Mich Talebzadeh
s > > > > arguments > > countSparkDF > > > > newColumName: > > results from column sum will be sorted here > > > > columnNames: > > list of columns to sum > > > &

Re: What are your experiences using google cloud platform

2022-01-24 Thread Andrew Davidson
wSumsImpl BEGIN" ) # https://stackoverflow.com/a/54283997/4586180 retDF = countsSparkDF.na.fill( 0 ).withColumn( newColName , reduce( add, [col( x ) for x in columnNames] ) ) self.logger.warn( "rowSumsImpl END\n" ) return retDF From: Mich Talebzade

Re: What are your experiences using google cloud platform

2022-01-24 Thread Mich Talebzadeh
Dataproc works fine. The current version is Spark 3.1.2. Look at your code, hardware and scaling. HTH view my Linkedin profile *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction

Re: What are your experiences using google cloud platform

2022-01-23 Thread German Schiavon
Hi, Changing cloud providers won't help if your job is slow, has skew, etc... I think first you have to see why "big jobs" are not completing. On Sun, 23 Jan 2022 at 22:18, Andrew Davidson wrote: > Hi recently started using GCP dataproc spark. > > > > Seem to have trouble getting big jobs to c

What are your experiences using google cloud platform

2022-01-23 Thread Andrew Davidson
Hi recently started using GCP dataproc spark. Seem to have trouble getting big jobs to complete. I am using check points. I am wondering if maybe I should look for another cloud solution Kind regards Andy