If you already loaded csv data into a dataframe, why not register it as a table, and use Spark SQL to find max/min or any other aggregates? SELECT MAX(column_name) FROM dftable_name ... seems natural.
JESSE CHEN
Big Data Performance | IBM Analytics
Office: 408 463 2296
Mobile: 408 828 9068
Email: [email protected]
From: ashensw <[email protected]>
To: [email protected]
Date: 08/28/2015 05:40 AM
Subject: Calculating Min and Max Values using Spark Transformations?
Hi all,
I have a dataset which consist of large number of features(columns). It is
in csv format. So I loaded it into a spark dataframe. Then I converted it
into a JavaRDD<Row> Then using a spark transformation I converted that into
JavaRDD<String[]>. Then again converted it into a JavaRDD<double[]>. So now
I have a JavaRDD<double[]>. So is there any method to calculate max and min
values of each columns in this JavaRDD<double[]> ?
Or Is there any way to access the array if I store max and min values to a
array inside the spark transformation class?
Thanks.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Calculating-Min-and-Max-Values-using-Spark-Transformations-tp24491.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
