Thank you for your help. After restructuring my code to Seans input, it
worked without changing Spark context. I now took the same file format just
a bigger file(2.7GB) from s3 to my cluster with 4 c3.xlarge instances and
Spark 1.0.2. Unluckly my task freezes again after a short time. I tried it
w
Thanks for your answers. I added some lines to my code and it went through,
but I get a error message for my compute cost function now...
scala> val WSSSE = model.computeCost(train)14/08/08 15:48:42 WARN
BlockManagerMasterActor: Removing BlockManager BlockManagerId(,
192.168.0.33, 49242, 0) with n
Thanks for your answers. The dataset is only 400MB, so I shouldn't run out of
memory. I restructured my code now, because I forgot to cache my dataset and
set down number of iterations to 2, but still get kicked out of Spark. Did I
cache the data wrong (sorry not an expert):
scala> import org.apac
I want to perform a K-Means task and fail training the model and get kicked
out of Sparks scala shell before I get my result metrics. I am not sure if
the input format is the problem or something else. I use Spark 1.0.0 and my
input textile (400MB) looks like this:
86252 3711 15.4 4.18 86252 3504
I want to use pagerank on a 3GB textfile, which contains a bipartite list
with variables "id" and "brand".
Example:
id,brand
86246,15343
86246,27873
86246,14647
86246,55172
86246,3293
86246,2820
86246,3830
86246,2820
86246,5603
86246,72482
To perform the page rank I have to create a graph object
so I need to reconfigure my sparkcontext this way:
val conf = new SparkConf()
.setMaster("local")
.setAppName("CountingSheep")
.set("spark.executor.memory", "1g")
.set("spark.akka.frameSize","20")
val sc = new SparkContext(conf)
And start a new
Tried the newest branch, but still get stuck on the same task: (kill) runJob
at SlidingRDD.scala:74
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Terminal-freeze-during-SVM-Broken-pipe-tp9022p9304.html
Sent from the Apache Spark User List mailing list ar
By latest branch you mean Apache Spark 1.0.0 ? and what do you mean by
master? Because I am using v 1.0.0 - Alex
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Terminal-freeze-during-SVM-Broken-pipe-tp9022p9208.html
Sent from the Apache Spark User List mail
Nick Pentreath wrote
> Take a look at Kaggle competition datasets
> - https://www.kaggle.com/competitions
I was looking for files in LIBSVM format and never found something on Kaggle
in bigger size. Most competitions I ve seen need data processing and feature
generating, but maybe I ve to take a s
Hello!
I want to play around with several different cluster settings and measure
performances for MLlib and GraphX and was wondering if anybody here could
hit me up with datasets for these applications from 5GB onwards?
I mostly interested in SVM and Triangle Count, but would be glad for any
he
10 matches
Mail list logo