Hello,
I am struggling with a task that should be super simple:
I define a structType to load json data from kafka with spark structed
streaming, and some fields may have no value, how can I set a default value for
this record?
For example:
StructType(
Array(StructField("a", StringType, nullable = true),
StructField("b", StringType, nullable = true),
StructField("c", StringType, nullable = true))
)
spark
.readStream
.format("kafka")
.option("kafka.bootstrap.servers", "localhost:9092")
.option("subscribe", "input-topic")
.option("failOnDataLoss", "false")
.load()
df.writeStream
.format(format)
.option("checkpointLocation", checkpoint)
.option("path", path)
.outputMode(OutputMode.Append)
.trigger(ProcessingTime("10 seconds"))
.start()
If input data has no b, how can I set a default value(xxx) , only use udf?