Maybe you could try something like that: SparkSession sparkSession =
SparkSession
.builder()
.appName("Rows2DataSet")
.master("local")
.getOrCreate();
List<Row> results = new LinkedList<Row>();
JavaRDD<Row> jsonRDD =
new
JavaSparkContext(sparkSession.sparkContext()).parallelize(results);
Dataset<Row> peopleDF = sparkSession.createDataFrame(jsonRDD,
Row.class);
Richard Xin
On Tuesday, March 28, 2017 7:51 AM, Karin Valisova <[email protected]>
wrote:
Hello!
I am running Spark on Java and bumped into a problem I can't solve or find
anything helpful among answered questions, so I would really appreciate your
help.
I am running some calculations, creating rows for each result:
List<Row> results = new LinkedList<Row>();
for(something){ results.add(RowFactory.create( someStringVariable,
someIntegerVariable )); }
Now I ended up with a list of rows I need to turn into dataframe to perform
some spark sql operations on them, like groupings and sorting. Would like to
keep the dataTypes.
I tried:
Dataset<Row> toShow = spark.createDataFrame(results, Row.class);
but it throws nullpointer. (spark being SparkSession) Is my logic wrong there
somewhere, should this operation be possible, resulting in what I want? Or do I
have to create a custom class which extends serializable and create a list of
those objects rather than Rows? Will I be able to perform SQL queries on
dataset consisting of custom class objects rather than rows?
I'm sorry if this is a duplicate question.Thank you for your help!Karin