I am new to Scala. I have a dataset with many columns, each column has a
column name. Given several column names (these column names are not fixed,
they are generated dynamically), I need to sum up the values of these
columns. Is there an efficient way of doing this?
I worked out a way by using for loop, but I don't think it is efficient:
val AllLabels = List("ID", "val1", "val2", "val3", "val4")
val lbla = List("val1", "val3", "val4")
val index_lbla = lbla.map(x => AllLabels.indexOf(x))
val dataRDD = sc.textFile("../test.csv").map(_.split(","))
dataRDD.map(x=>
{
var sum = 0.0
for (i <- 1 to index_lbla.length)
sum = sum + x(i).toDouble
sum
}
).collect
The test.csv looks like below (without column names):
"ID", "val1", "val2", "val3", "val4"
A, 123, 523, 534, 893
B, 536, 98, 1623, 98472
C, 537, 89, 83640, 9265
D, 7297, 98364, 9, 735
...
Your help is very much appreciated!
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-sum-up-the-values-in-the-columns-of-a-dataset-in-Scala-tp21639.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]