Re: Create Dataframe from a single String in Java

martin Thu, 18 Nov 2021 02:47:26 -0800

PS: The following works, but it seems rather awkward having to use theSQLContext here.


SQLContext sqlContext = new SQLContext(sparkContext);


Dataset<Row> data = sqlContext
      .createDataset(textList, Encoders.STRING())
      .withColumnRenamed("value", "text");

Am 2021-11-18 11:26, schrieb [email protected]:

Hello,
I am struggling with a task that should be super simple: I would liketo create a Spark DF of Type Dataset<Row> with one column from a singleString (or from a one-element List of Strings). The column headershould be "text".
SparkContext.parallelize() does not work, because it returns RDD<T> andnot Dataset<Row> and it takes a "ClassTag" as 3rd parameter.
I am able to convert a List of Strings to JavaRDD<Row> using this:
JavaSparkContext javaSparkContext = new JavaSparkContext(sparkContext);

JavaRDD<String> javaRdd = javaSparkContext.parallelize(textList);
But then I am stuck with this javaRDD. Besides, it seems overly complexhaving to create an intermediate representation.
There is also this SO post with a solution in Scala that I have notbeen able to convert to Java, because the APIs differ:
https://stackoverflow.com/questions/44028677/how-to-create-a-dataframe-from-a-string

Basically, what I am looking for is something simple like:

Dataset<Row> myData = sparkSession.createDataFrame(textList, "text");

Any hints? Thanks a lot.

Cheers,

Martin

Re: Create Dataframe from a single String in Java

Reply via email to