Thanks a lot, Sebastian and Vibhor. You're right, I can call the
createDataset() also on the Spark session. Not sure how I missed that.
Cheers,
Martin
Am 2021-11-18 12:01, schrieb Vibhor Gupta:
You can try something like below. It creates a dataset and then
converts it into a dataframe.
sparkSession.createDataset(
Arrays.asList("apple","orange","banana"),
Encoders.STRING()
).toDF("fruits").show();
Regards,
Vibhor Gupta.
-------------------------
From: Sebastian Piu <sebastian....@gmail.com>
Sent: Thursday, November 18, 2021 4:20 PM
To: mar...@wunderlich.com <mar...@wunderlich.com>
Cc: user <user@spark.apache.org>
Subject: EXT: Re: Create Dataframe from a single String in Java
EXTERNAL: Report suspicious emails to Email Abuse.
You can call that on sparkSession to
On Thu, 18 Nov 2021, 10:48 , <mar...@wunderlich.com> wrote:
PS: The following works, but it seems rather awkward having to use the
SQLContext here.
SQLContext sqlContext = new SQLContext(sparkContext);
Dataset<Row> data = sqlContext
.createDataset(textList, Encoders.STRING())
.withColumnRenamed("value", "text");
Am 2021-11-18 11:26, schrieb mar...@wunderlich.com:
Hello,
I am struggling with a task that should be super simple: I would like
to create a Spark DF of Type Dataset<Row> with one column from a single
String (or from a one-element List of Strings). The column header
should be "text".
SparkContext.parallelize() does not work, because it returns RDD<T> and
not Dataset<Row> and it takes a "ClassTag" as 3rd parameter.
I am able to convert a List of Strings to JavaRDD<Row> using this:
JavaSparkContext javaSparkContext = new JavaSparkContext(sparkContext);
JavaRDD<String> javaRdd = javaSparkContext.parallelize(textList);
But then I am stuck with this javaRDD. Besides, it seems overly complex
having to create an intermediate representation.
There is also this SO post with a solution in Scala that I have not
been able to convert to Java, because the APIs differ:
https://stackoverflow.com/questions/44028677/how-to-create-a-dataframe-from-a-string
Basically, what I am looking for is something simple like:
Dataset<Row> myData = sparkSession.createDataFrame(textList, "text");
Any hints? Thanks a lot.
Cheers,
Martin