Re: multiple query with structured streaming in spark does not work

2021-05-21 Thread Amit Joshi
Hi Jian, I found this link that could be useful. https://spark.apache.org/docs/latest/job-scheduling.html#scheduling-within-an-application By the way you can try once giving enough resources to run both jobs without defining the scheduler. I mean run the queries with default scheduler, but provid

RE: multiple query with structured streaming in spark does not work

2021-05-21 Thread jianxu
Hi Amit; � Further to my last email, I managed to set the scheduler to fair via code conf = SparkConf().setMaster("local").setAppName("HSMSTest1").set("spark.scheduler.mode", "FAIR") � I can see the mode is changed in web view. Though the result is same. This does not work out. And it mig

RE: multiple query with structured streaming in spark does not work

2021-05-21 Thread jianxu
Hi Amit; � Thank you for your prompt reply and kind help. Wonder how to set the scheduler to FAIR mode in python. Following code seems to me does not work out. � conf = SparkConf().setMaster("local").setAppName("HSMSTest1") sc = SparkContext(conf=conf) sc.setLocalProperty('spark.scheduler.

Re: multiple query with structured streaming in spark does not work

2021-05-21 Thread Amit Joshi
Hi Jian, You have to use same spark session to run all the queries. And use the following to wait for termination. q1 = writestream.start q2 = writstream2.start spark.streams.awaitAnyTermination And also set the scheduler in the spark config to FAIR scheduler. Regards Amit Joshi On Saturday

multiple query with structured streaming in spark does not work

2021-05-21 Thread jianxu
Hi There; � I am new to spark. We are using spark to develop our app for data streaming with sensor readings. � I am having trouble to get two queries with structured streaming working concurrently. � Following is the code. It can only work with one of them. Wonder if there is any way

Re: Calculate average from Spark stream

2021-05-21 Thread Mich Talebzadeh
OK where is your watermark created? That is the one that works out the average temperature! # construct a streaming dataframe streamingDataFrame that subscribes to topic temperature streamingDataFrame = self.spark \ .readStream \ .format("kaf

Re: DF blank value fill

2021-05-21 Thread ayan guha
Hi You can do something like this: SELECT MainKey, Subkey, case when val1 is null then newval1 else val1 end val1, case when val2 is null then newval2 else val1 end val2, case when val3 is null then newval3 else val1 end val3 from (select mainkey,subkey,

Re: [External Sender] Memory issues in 3.0.2 but works well on 2.4.4

2021-05-21 Thread Femi Anthony
Post the stack trace and provide some more details about your configuration On Fri, May 21, 2021 at 7:52 AM Praneeth Shishtla wrote: > Hi, > I have a simple DecisionForest model and was able to train the model on > pyspark==2.4.4 without any issues. > However, when I upgraded to pyspark==3.0.2,

Memory issues in 3.0.2 but works well on 2.4.4

2021-05-21 Thread Praneeth Shishtla
Hi, I have a simple DecisionForest model and was able to train the model on pyspark==2.4.4 without any issues. However, when I upgraded to pyspark==3.0.2, the fit takes a lot of time and eventually errors out saying out of memory. Even tried reducing the number of samples for training but no luck.

DF blank value fill

2021-05-21 Thread Bode, Meikel, NMA-CFD
Hi all, My df looks like follows: Situation: MainKey, SubKey, Val1, Val2, Val3, ... 1, 2, a, null, c 1, 2, null, null, c 1, 3, null, b, null 1, 3, a, null, c Desired outcome: 1, 2, a, b, c 1, 2, a, b, c 1, 3, a, b, c 1, 3, a, b, c How could I populate/synchronize empty cells of all records wi