Many thanks all, especially to Mich. That is what I was looking for.
On Friday, 15 October 2021, 09:28:24 BST, Mich Talebzadeh
wrote:
Spark allows one to define the column format as StructType or list. By default
Spark assumes that all fields are nullable when creating a dataframe.
To
Spark allows one to define the column format as StructType or list. By
default Spark assumes that all fields are nullable when creating a
dataframe.
To change nullability you need to provide the structure of the columns.
Assume that I have created an RDD in the form
rdd = sc.parallelize(Range).
I see some nice answers at
https://stackoverflow.com/questions/46072411/can-i-change-the-nullability-of-a-column-in-my-spark-dataframe
On Thu, 14 Oct 2021 at 5:21 PM, ashok34...@yahoo.com.INVALID
wrote:
> Gurus,
>
> I have an RDD in PySpark that I can convert to DF through
>
> df = rdd.toDF()
>
Gurus,
I have an RDD in PySpark that I can convert to DF through
df = rdd.toDF()
However, when I do
df.printSchema()
I see the columns as nullable. = true by default
root |-- COL-1: long (nullable = true) |-- COl-2: double (nullable = true) |--
COl-3: string (nullable = true) What would be the e