Hello!
I am attempting to write a streaming pipeline that would consume data from a
Kafka source, manipulate the data, and then write results to a downstream sink
(Kafka, Redis, etc). I want to write fully formed SQL instead of using the
function API that Spark offers. I read a few guides on ho
ok let us take it for a test.
The original code of mine
def fetch_data(self):
self.sc.setLogLevel("ERROR")
schema = StructType() \
.add("rowkey", StringType()) \
.add("timestamp", TimestampType()) \
.add("temperature", IntegerType())
checkpoi
Hi Mich,
Thank you so much for your response. I really appreciate your help!
You mentioned "defining the watermark using the withWatermark function on the
streaming_df before creating the temporary view” - I believe this is what I’m
doing and it’s not working for me. Here is the exact code snip
hm. you are getting below
AnalysisException: Append output mode not supported when there are
streaming aggregations on streaming DataFrames/DataSets without watermark;
The problem seems to be that you are using the append output mode when
writing the streaming query results to Kafka. This mode is