Re: apache-spark: Converting List of Rows into Dataset Java

Richard Xin Tue, 28 Mar 2017 19:18:07 -0700

Maybe you could try something like that:        SparkSession sparkSession = 
SparkSession
                .builder()
                .appName("Rows2DataSet")
                .master("local")
                .getOrCreate();
        List<Row> results = new LinkedList<Row>();
        JavaRDD<Row> jsonRDD =
                new 
JavaSparkContext(sparkSession.sparkContext()).parallelize(results);
        
        Dataset<Row> peopleDF = sparkSession.createDataFrame(jsonRDD, 
Row.class);


Richard Xin 

    On Tuesday, March 28, 2017 7:51 AM, Karin Valisova <[email protected]> 
wrote:
 

 Hello!
I am running Spark on Java and bumped into a problem I can't solve or find 
anything helpful among answered questions, so I would really appreciate your 
help. 
I am running some calculations, creating rows for each result:
List<Row> results = new LinkedList<Row>();

for(something){ results.add(RowFactory.create( someStringVariable, 
someIntegerVariable ));         }
Now I ended up with a list of rows I need to turn into dataframe to perform 
some spark sql operations on them, like groupings and sorting. Would like to 
keep the dataTypes.
I tried: 
Dataset<Row> toShow = spark.createDataFrame(results, Row.class);

but it throws nullpointer. (spark being SparkSession) Is my logic wrong there 
somewhere, should this operation be possible, resulting in what I want? Or do I 
have to create a custom class which extends serializable and create a list of 
those objects rather than Rows? Will I be able to perform SQL queries on 
dataset consisting of custom class objects rather than rows?
I'm sorry if this is a duplicate question.Thank you for your help!Karin

Re: apache-spark: Converting List of Rows into Dataset Java

Reply via email to