Hi Jian,
I found this link that could be useful.
https://spark.apache.org/docs/latest/job-scheduling.html#scheduling-within-an-application
By the way you can try once giving enough resources to run both jobs
without defining the scheduler.
I mean run the queries with default scheduler, but provid
Hi Amit;
�
Further to my last email, I managed to set the scheduler to fair via code conf
=
SparkConf().setMaster("local").setAppName("HSMSTest1").set("spark.scheduler.mode",
"FAIR")
�
I can see the mode is changed in web view. Though the result is same. This does
not work out. And it mig
Hi Amit;
�
Thank you for your prompt reply and kind help. Wonder how to set the scheduler
to FAIR mode in python. Following code seems to me does not work out.
�
conf = SparkConf().setMaster("local").setAppName("HSMSTest1")
sc = SparkContext(conf=conf)
sc.setLocalProperty('spark.scheduler.
Hi Jian,
You have to use same spark session to run all the queries.
And use the following to wait for termination.
q1 = writestream.start
q2 = writstream2.start
spark.streams.awaitAnyTermination
And also set the scheduler in the spark config to FAIR scheduler.
Regards
Amit Joshi
On Saturday
Hi There;
�
I am new to spark. We are using spark to develop our app for data streaming
with sensor readings.
�
I am having trouble to get two queries with structured streaming working
concurrently.
�
Following is the code. It can only work with one of them. Wonder if there is
any way
OK where is your watermark created? That is the one that works out the
average temperature!
# construct a streaming dataframe streamingDataFrame that
subscribes to topic temperature
streamingDataFrame = self.spark \
.readStream \
.format("kaf
Hi
You can do something like this:
SELECT MainKey, Subkey,
case when val1 is null then newval1 else val1 end val1,
case when val2 is null then newval2 else val1 end val2,
case when val3 is null then newval3 else val1 end val3
from (select mainkey,subkey,
Post the stack trace and provide some more details about your configuration
On Fri, May 21, 2021 at 7:52 AM Praneeth Shishtla
wrote:
> Hi,
> I have a simple DecisionForest model and was able to train the model on
> pyspark==2.4.4 without any issues.
> However, when I upgraded to pyspark==3.0.2,
Hi,
I have a simple DecisionForest model and was able to train the model on
pyspark==2.4.4 without any issues.
However, when I upgraded to pyspark==3.0.2, the fit takes a lot of time and
eventually errors out saying out of memory. Even tried reducing the number
of samples for training but no luck.
Hi all,
My df looks like follows:
Situation:
MainKey, SubKey, Val1, Val2, Val3, ...
1, 2, a, null, c
1, 2, null, null, c
1, 3, null, b, null
1, 3, a, null, c
Desired outcome:
1, 2, a, b, c
1, 2, a, b, c
1, 3, a, b, c
1, 3, a, b, c
How could I populate/synchronize empty cells of all records wi
10 matches
Mail list logo