May be you could try something like this using sparkSQL 1.4 and dataframes
student.join(Grade, Grade("student_id") === student("student_id"), "left")
.groupBy("id")
.agg(sum(grade("Marks")), avg(grade("Marks")))
You could refer to the following document :
https://spark.apache.o
I am looking to do something similar to this Postgres query in HiveQL. If
I have a DataFrame student and a DataFrame grade, is this possible?
I read in Learning Spark: Lightning-Fast Big Data Analysis that it should
be possible. It says in Chapter 9
"SchemaRDDs can store several basic types, as