Looks like the parallelization into RDD was the right move I was omitting,
JavaRDD<Row> jsonRDD = new JavaSparkContext(sparkSession.
sparkContext()).parallelize(results);
then I created a schema as
List<StructField> fields = new ArrayList<StructField>();
fields.add(DataTypes.createStructField("column_name1",
DataTypes.StringType, true));
fields.add....
StructType schema = DataTypes.createStructType(fields);
and then just, voilà! Have my dataset withou any nullpointers exceptions :)
Dataset<Row> resultDataset = spark.createDataFrame(rdd, schema);
Thanks a lot!!
Have a nice day,
Karin
On Wed, Mar 29, 2017 at 4:17 AM, Richard Xin <[email protected]>
wrote:
> Maybe you could try something like that:
> SparkSession sparkSession = SparkSession
> .builder()
> .appName("Rows2DataSet")
> .master("local")
> .getOrCreate();
> List<Row> results = new LinkedList<Row>();
> JavaRDD<Row> jsonRDD =
> new JavaSparkContext(sparkSession.
> sparkContext()).parallelize(results);
>
> Dataset<Row> peopleDF = sparkSession.createDataFrame(jsonRDD,
> Row.class);
>
> Richard Xin
>
>
> On Tuesday, March 28, 2017 7:51 AM, Karin Valisova <[email protected]>
> wrote:
>
>
> Hello!
>
> I am running Spark on Java and bumped into a problem I can't solve or find
> anything helpful among answered questions, so I would really appreciate
> your help.
>
> I am running some calculations, creating rows for each result:
>
> List<Row> results = new LinkedList<Row>();
>
> for(something){
> results.add(RowFactory.create( someStringVariable, someIntegerVariable ));
> }
>
> Now I ended up with a list of rows I need to turn into dataframe to
> perform some spark sql operations on them, like groupings and sorting.
> Would like to keep the dataTypes.
>
> I tried:
>
> Dataset<Row> toShow = spark.createDataFrame(results, Row.class);
>
> but it throws nullpointer. (spark being SparkSession) Is my logic wrong
> there somewhere, should this operation be possible, resulting in what I
> want?
> Or do I have to create a custom class which extends serializable and
> create a list of those objects rather than Rows? Will I be able to perform
> SQL queries on dataset consisting of custom class objects rather than rows?
>
> I'm sorry if this is a duplicate question.
> Thank you for your help!
> Karin
>
>
>
--
datapine GmbH
Skalitzer Straße 33
10999 Berlin
email: [email protected]