Incase you are specifically looking for a createDataframe method, you can use


sparkSession.createDataFrame(
    
Arrays.asList("apple","orange","banana").stream().map(RowFactory::create).collect(Collectors.toList()),
    new StructType().add("fruits", "string")
).show();

Regards,
Vibhor Gupta
________________________________
From: [email protected] <[email protected]>
Sent: Thursday, November 18, 2021 5:37 PM
To: Vibhor Gupta <[email protected]>
Cc: [email protected] <[email protected]>; user 
<[email protected]>
Subject: Re: EXT: Re: Create Dataframe from a single String in Java

EXTERNAL: Report suspicious emails to Email Abuse.


Thanks a lot, Sebastian and Vibhor. You're right, I can call the 
createDataset() also on the Spark session. Not sure how I missed that.

Cheers,

Martin




Am 2021-11-18 12:01, schrieb Vibhor Gupta:

You can try something like below. It creates a dataset and then converts it 
into a dataframe.


sparkSession.createDataset(
    Arrays.asList("apple","orange","banana"),
    Encoders.STRING()
).toDF("fruits").show();

Regards,
Vibhor Gupta.

________________________________
From: Sebastian Piu <[email protected]>
Sent: Thursday, November 18, 2021 4:20 PM
To: [email protected] <[email protected]>
Cc: user <[email protected]>
Subject: EXT: Re: Create Dataframe from a single String in Java

EXTERNAL: Report suspicious emails to Email Abuse.

You can call that on sparkSession to

On Thu, 18 Nov 2021, 10:48 , 
<[email protected]<mailto:[email protected]>> wrote:

PS: The following works, but it seems rather awkward having to use the 
SQLContext here.

SQLContext sqlContext = new SQLContext(sparkContext);

Dataset<Row> data = sqlContext
      .createDataset(textList, Encoders.STRING())
      .withColumnRenamed("value", "text");




Am 2021-11-18 11:26, schrieb 
[email protected]<mailto:[email protected]>:

Hello,

I am struggling with a task that should be super simple: I would like to create 
a Spark DF of Type Dataset<Row> with one column from a single String (or from a 
one-element List of Strings). The column header should be "text".

SparkContext.parallelize() does not work, because it returns RDD<T> and not 
Dataset<Row> and it takes a "ClassTag" as 3rd parameter.

I am able to convert a List of Strings to JavaRDD<Row> using this:
    JavaSparkContext javaSparkContext = new JavaSparkContext(sparkContext);

    JavaRDD<String> javaRdd = javaSparkContext.parallelize(textList);

But then I am stuck with this javaRDD. Besides, it seems overly complex having 
to create an intermediate representation.

There is also this SO post with a solution in Scala that I have not been able 
to convert to Java, because the APIs differ:

https://stackoverflow.com/questions/44028677/how-to-create-a-dataframe-from-a-string

Basically, what I am looking for is something simple like:

    Dataset<Row> myData = sparkSession.createDataFrame(textList, "text");

Any hints? Thanks a lot.

Cheers,

Martin

Reply via email to