Re: Nested DataFrames

2015-06-25 Thread pawan kumar
May be you could try something like this using sparkSQL 1.4 and dataframes student.join(Grade, Grade("student_id") === student("student_id"), "left") .groupBy("id") .agg(sum(grade("Marks")), avg(grade("Marks"))) You could refer to the following document : https://spark.apache.o

RE: Nested DataFrames

2015-06-25 Thread Richard Catlin
I am looking to do something similar to this Postgres query in HiveQL. If I have a DataFrame student and a DataFrame grade, is this possible? I read in Learning Spark: Lightning-Fast Big Data Analysis that it should be possible. It says in Chapter 9 "SchemaRDDs can store several basic types, as