date:20240402

[Spark SQL] How can I use .sql() in conjunction with watermarks?

2024-04-02 Thread Chloe He

Hello! I am attempting to write a streaming pipeline that would consume data from a Kafka source, manipulate the data, and then write results to a downstream sink (Kafka, Redis, etc). I want to write fully formed SQL instead of using the function API that Spark offers. I read a few guides on ho

Re: [Spark SQL] How can I use .sql() in conjunction with watermarks?

2024-04-02 Thread Mich Talebzadeh

ok let us take it for a test. The original code of mine def fetch_data(self): self.sc.setLogLevel("ERROR") schema = StructType() \ .add("rowkey", StringType()) \ .add("timestamp", TimestampType()) \ .add("temperature", IntegerType()) checkpoi

RE: Re: [Spark SQL] How can I use .sql() in conjunction with watermarks?

2024-04-02 Thread Chloe He

Hi Mich, Thank you so much for your response. I really appreciate your help! You mentioned "defining the watermark using the withWatermark function on the streaming_df before creating the temporary view” - I believe this is what I’m doing and it’s not working for me. Here is the exact code snip

Re: Re: [Spark SQL] How can I use .sql() in conjunction with watermarks?

2024-04-02 Thread Mich Talebzadeh

hm. you are getting below AnalysisException: Append output mode not supported when there are streaming aggregations on streaming DataFrames/DataSets without watermark; The problem seems to be that you are using the append output mode when writing the streaming query results to Kafka. This mode is