date:20210225

Re: configuring .sparkStaging with group rwx

2021-02-25 Thread Yuri Oleynikov (‫יורי אולייניקוב‬‎)

Spark-submit --conf spark.hadoop.fs.permissions.umask-mode=007 You may also set sticky bit on staging dir Sent from my iPhone > On 26 Feb 2021, at 03:29, Bulldog20630405 wrote: > > > > we have a spark cluster running on with multiple users... > when running with the user owning the cluster

configuring .sparkStaging with group rwx

2021-02-25 Thread Bulldog20630405

we have a spark cluster running on with multiple users... when running with the user owning the cluster jobs run fine... however, when trying to run pyspark with a different user it fails because the .sparkStaging/application_* is written with 700 so the user cannot write to that directory how

Re: Structured streaming, Writing Kafka topic to BigQuery table, throws error

2021-02-25 Thread Mich Talebzadeh

Hi, I managed to make mine work using the *foreachBatch function *in writeStream. "foreach" performs custom write logic on each row and "foreachBatch" performs custom write logic on each micro-batch through SendToBigQuery function here foreachBatch(SendToBigQuery) expects 2 parameters, first: mi

Re: Structured Streaming With Kafka - processing each event

2021-02-25 Thread Mich Talebzadeh

Hi Sachit, I managed to make mine work using the *foreachBatch function *in writeStream. "foreach" performs custom write logic on each row and "foreachBatch" performs custom write logic on each micro-batch through SendToBigQuery function here foreachBatch(SendToBigQuery) expects 2 parameters, fi

Re: How to control count / size of output files for

2021-02-25 Thread Gourav Sengupta

Hi Ivan, sorry but it always helps to know the version of SPARK you are using, its environment, and the format that you are writing out your files to, and any other details if possible. Regards, Gourav Sengupta On Wed, Feb 24, 2021 at 3:43 PM Ivan Petrov wrote: > Hi, I'm trying to control the

Re: How to control count / size of output files for

2021-02-25 Thread Ivan Petrov

Ah... makes sense, thank you. i tried sortWithinPartition before and replaced with sort. It was a mistake. чт, 25 февр. 2021 г. в 15:25, Pietro Gentile < pietro.gentile89.develo...@gmail.com>: > Hi, > > It is because of *repartition* before the *sort* method invocation. If > you reverse them you'

Aggregating large objects and reducing memory pressure

2021-02-25 Thread Augusto

Hi I am writing here because I need help/advice on how to perform aggregations more efficiently. In my current setup I have a Accumulator object which is used as zeroValue for the foldByKey function. This Accumulator object can get very large since the accumulations also include lists and maps. T

Re: Structured Streaming With Kafka - processing each event

2021-02-25 Thread Mich Talebzadeh

BTW you intend to process these in 30 seconds? processingTime="30 seconds So how many rows of data are sent in microbatch and what is the interval at which you receive the data in batches from the producer? LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOAB

Re: Structured Streaming With Kafka - processing each event

2021-02-25 Thread Mich Talebzadeh

If you are receiving data from Kafka, Wouldn't that be better in Json format? . try: # construct a streaming dataframe streamingDataFrame that subscribes to topic config['MDVariables']['topic']) -> md (market data) streamingDataFrame = self.spark \ .re

Re: configuring .sparkStaging with group rwx

configuring .sparkStaging with group rwx

Re: Structured streaming, Writing Kafka topic to BigQuery table, throws error

Re: Structured Streaming With Kafka - processing each event

Re: How to control count / size of output files for

Re: How to control count / size of output files for

Aggregating large objects and reducing memory pressure

Re: Structured Streaming With Kafka - processing each event

Re: Structured Streaming With Kafka - processing each event

9 matches

Site Navigation

Mail list logo

Footer information