You could convert your csv file to an rdd of vectors. Then use stats from mllib.
Also this should be in the user list not the developer list. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Apache-Spark-Data-Aggregation-using-Java-API-tp9902p9924.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org