Your latest response doesn't show up here yet, I only got the mail. I'll still answer here in the hope that it appears later:
Which memory setting do you mean? I can go up with spark.executor.memory a bit, it's currently set to 12G. But thats already way more than the whole SchemaRDD of Vectors that I currently use for training, which shouldn't be more than a few hundred M. I suppose you rather mean something comparable to SHARK_MASTER_MEM in Shark. I can't find the equivalent for Spark in the documentations, though. And if it helps, I can summarize the whole code currently that I currently use. It's nothing really fancy at the moment, I'm just trying to classify Strings that each contain a few words (words are handled each as atomic items). -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/KMeans-for-large-training-data-tp9407p9509.html Sent from the Apache Spark User List mailing list archive at Nabble.com.