Re: KMeans Input Format

2014-08-09 Thread AlexanderRiggers
Thank you for your help. After restructuring my code to Seans input, it worked without changing Spark context. I now took the same file format just a bigger file(2.7GB) from s3 to my cluster with 4 c3.xlarge instances and Spark 1.0.2. Unluckly my task freezes again after a short time. I tried it w

Re: KMeans Input Format

2014-08-08 Thread AlexanderRiggers
Thanks for your answers. I added some lines to my code and it went through, but I get a error message for my compute cost function now... scala> val WSSSE = model.computeCost(train)14/08/08 15:48:42 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(, 192.168.0.33, 49242, 0) with n

Re: KMeans Input Format

2014-08-07 Thread AlexanderRiggers
Thanks for your answers. The dataset is only 400MB, so I shouldn't run out of memory. I restructured my code now, because I forgot to cache my dataset and set down number of iterations to 2, but still get kicked out of Spark. Did I cache the data wrong (sorry not an expert): scala> import org.apac

KMeans Input Format

2014-08-07 Thread AlexanderRiggers
I want to perform a K-Means task and fail training the model and get kicked out of Sparks scala shell before I get my result metrics. I am not sure if the input format is the problem or something else. I use Spark 1.0.0 and my input textile (400MB) looks like this: 86252 3711 15.4 4.18 86252 3504

GraphX Pagerank application

2014-08-06 Thread AlexanderRiggers
I want to use pagerank on a 3GB textfile, which contains a bipartite list with variables "id" and "brand". Example: id,brand 86246,15343 86246,27873 86246,14647 86246,55172 86246,3293 86246,2820 86246,3830 86246,2820 86246,5603 86246,72482 To perform the page rank I have to create a graph object

Re: Terminal freeze during SVM

2014-07-16 Thread AlexanderRiggers
so I need to reconfigure my sparkcontext this way: val conf = new SparkConf() .setMaster("local") .setAppName("CountingSheep") .set("spark.executor.memory", "1g") .set("spark.akka.frameSize","20") val sc = new SparkContext(conf) And start a new

Re: Terminal freeze during SVM

2014-07-10 Thread AlexanderRiggers
Tried the newest branch, but still get stuck on the same task: (kill) runJob at SlidingRDD.scala:74 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Terminal-freeze-during-SVM-Broken-pipe-tp9022p9304.html Sent from the Apache Spark User List mailing list ar

Re: Terminal freeze during SVM

2014-07-09 Thread AlexanderRiggers
By latest branch you mean Apache Spark 1.0.0 ? and what do you mean by master? Because I am using v 1.0.0 - Alex -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Terminal-freeze-during-SVM-Broken-pipe-tp9022p9208.html Sent from the Apache Spark User List mail

Re: Sample datasets for MLlib and Graphx

2014-07-03 Thread AlexanderRiggers
Nick Pentreath wrote > Take a look at Kaggle competition datasets > - https://www.kaggle.com/competitions I was looking for files in LIBSVM format and never found something on Kaggle in bigger size. Most competitions I ve seen need data processing and feature generating, but maybe I ve to take a s

Sample datasets for MLlib and Graphx

2014-07-03 Thread AlexanderRiggers
Hello! I want to play around with several different cluster settings and measure performances for MLlib and GraphX and was wondering if anybody here could hit me up with datasets for these applications from 5GB onwards? I mostly interested in SVM and Triangle Count, but would be glad for any he