So if I am using GraphX on Spark and I created a graph, which gets called a
lot later, do I want to cache graph? Or do I want to cache the vertices and
edges (actual data) that I use to create the graph?
e.g.
val graph = Graph(vertices, edges)
graph.blahblahblah
graph.blahblahblah
graph.blahblahb
After doing that, I ran my code once with a smaller example, and it worked.
But ever since then, I get the "No space left on device" message for the
same sample, even if I re-start the master...
ERROR TaskSetManager: Task 29.0:20 failed 4 times; aborting job
org.apache.spark.SparkException: Job ab
Ok. I tried setting the partition number to 128 and numbers greater than 128,
and now I get another error message about "Java heap space". Is it possible
that there is something wrong with the setup of my Spark cluster to begin
with? Or is it still an issue with partitioning my data? Or do I just n
How do you determine the number of partitions? For example, I have 16
workers, and the number of cores and the worker memory set in spark-env.sh
are:
CORE = 8
MEMORY = 16g
The .csv data I have is about 500MB, but I am eventually going to use a file
that is about 15GB.
Is the MEMORY variable in s
Spark is running fine, but I get this message. Does this mean that my data is
just too big?
14/04/22 17:06:20 ERROR TaskSchedulerImpl: Lost executor 2 on WORKER#2:
OutOfMemoryError
14/04/22 17:06:20 ERROR TaskSetManager: Task 550.0:2 failed 4 times;
aborting job
org.apache.spark.SparkException
wow! it worked! thank you so much!
so now, all I need to do is to put the number of workers that I want to use
when I read the data right?
e.g.
val numWorkers = 10
val data = sc.textFile("somedirectory/data.csv", numWorkers)
--
View this message in context:
http://apache-spark-user-list.10015
No, I am not using the aws. I am using one of the national lab's cluster. But
as I mentioned, I am pretty new to computer science, so I might not be
answering your question right... but 7077 is accessible.
Maybe I got it wrong from the get-go? I will just write down what I did...
Basically I have
Hi, I am trying to set up my own standalone Spark, and I started the master
node and worker nodes. Then I ran ./bin/spark-shell, and I get this message:
14/04/21 16:31:51 ERROR TaskSchedulerImpl: Lost an executor 1 (already
removed): remote Akka client disassociated
14/04/21 16:31:51 ERROR TaskSch