Hi Balakumar
Two things.
One - It seems like your cluster is running out of memory and then
eventually out of disc , likely while materializing the dataframe to write
(what's the volume of data created by the join?)
Two - Your job is running in local mode, and is able to utilize just the
master
Hi ,
While running the following spark code in the cluster with following
configuration it is spread into 3 job Id's
CLUSTER CONFIGURATION
3 NODE CLUSTER
NODE 1 - 64GB 16CORES
NODE 2 - 64GB 16CORES
NODE 3 - 64GB 16CORES
At Job Id 2 job is stuck at the stage 51 of 254 and then it starts
ut
Why not save the data frame to persistent storage s3/HDFS in the first
application and read it back in the 2nd ?
On Tue, Apr 16, 2019 at 8:58 PM Rishikesh Gawade
wrote:
> Hi.
> I wish to use a SparkSession created by one app in another app so that i
> can use the dataframes belonging to that ses
Hi,
Not possible. What are you really trying to do? Why do you need to share
dataframes? They're nothing but metadata of a distributed computation (no
data inside) so what would be the purpose of such sharing?
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
Mastering Spark SQL ht
You can't, sparkcontext is a singleton object. You have to use hadoop library
or aws client to read files on s3.
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe e-mail: user-unsubscr...@spark
Hello,
Is Kubernetes Dynamic executor scaling for spark is available in latest
release of spark
I mean scaling the executors based on the work load vs preallocating number
of executors for a spark job
Thanks,
Purna
Hi.
I wish to use a SparkSession created by one app in another app so that i
can use the dataframes belonging to that session. Is it possible to use the
same sparkSession in another app?
Thanks,
Rishikesh
Hi,
I am trying to read gzipped json data from s3, my idea would be to do =>
data = (s3_keys
.mapValues(lambda x: x, s3_read_data(x)
)
for that I though about using sc.textFile instead of s3_read_data, but wouldn't
work. Any idea how to achieve a solution in here?
C
Environment:
Spark: 2.4.0
Kubernetes:1.14
Query: Does application jar needs to be part of both Driver and Executor
image?
Invocation point (from Java code):
sparkLaunch = new SparkLauncher()
.setMaster(LINUX_MASTER)