Spark Akka/actor failures.

ldmtwo Wed, 13 Aug 2014 15:57:13 -0700

Need help getting around these errors. 

I have this program that runs fine on smaller input sizes. As it gets
larger, Spark has increasing difficulty of being efficient and functioning
without errors. We have about 46GB free on each node. The workers and
executors are configured to use this up (the only way not to have Heap Space
or GC overhead errors). On the driver, the data only uses 1.2GB RAM and is
in the form of /matrix: RDD[(Integer, Array[Float])]/. It's a matrix that is
column major with dimensions of 15k x 20k (columns). Each column takes about
4*15k = 60KB. 60KB*20k = 1.2GB. The data is not even that large. Eventually,
I want to test 60k x 70k.


The Covariance Matrix algorithm we are using is basicly. O(N^3) At minimum,
the outer loop needs to be parallelized.
  for each column i in matrix
     for each column j in matrix
          get the covariance between columns i and j

Covariance is practically this. (no need to parallelize since we have enough
work to do and this is small)
for the two columns, get the sum of squares. O(N)


Since I can't figure out a way to do permutation or nested for loop on RDD
any other way, I had to call matrix.cartesian(matrix).map{ pair => ... }. I
could do 5kx5k (1/4th of the work) using HashMap instead of RDD and finish
in 10 sec. If I partition with 3k, it takes 18 hours. 300 takes 12 hours.
200 fails (error #1). 16 would be ideal (error #2). Note that I set the Akka
frame size (spark-defaults.conf) to 15 to address some of the other errors
with Akka.





This is error #1


|
|
|
|
|
|
|
|
|
|
|
|
|
|

This is error 2





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Akka-actor-failures-tp12071.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Spark Akka/actor failures.

Reply via email to