Hi, I posted a query yesterday and have tried out all the options given in responses..
Basically, I am reading a very fat matrix (2000 by 500000 dimension matrix) and am trying to run kmeans on it. I keep on getting heap error.. Now, I am even using persist(StorageLevel.DISK_ONLY_2) option.. How do I process this large file.. The conf I am currently using conf = SparkConf().set("spark.executor.memory","16g").set("spark.akka.frameSize", "100000000").set("spark.driver.memory","4g").set("spark.rdd.compress","true") ? Thanks