Re: Spark StructuredStreaming - watermark not working as expected

2023-03-17 Thread karan alang
Hi Mich, I'm currently testing this on my mac .. are you able to reproduce this issue ? Note - the code is similar .. except outputMode is set to update. wrt outputMode - when using aggregation + watermark, the outputMode should be either append Or update, in your code - you have used 'complete'

Re: Spark StructuredStreaming - watermark not working as expected

2023-03-17 Thread Mich Talebzadeh
Hi Karan, The version tested was 3.1.1. Are you running on Dataproc serverless 3.1.3? Mich Talebzadeh, Lead Solutions Architect/Engineering Lead Palantir Technologies Limited view my Linkedin profile https://en.everybodywiki.com

Re: Spark StructuredStreaming - watermark not working as expected

2023-03-16 Thread karan alang
Fyi .. apache spark version is 3.1.3 On Wed, Mar 15, 2023 at 4:34 PM karan alang wrote: > Hi Mich, this doesn't seem to be working for me .. the watermark seems to > be getting ignored ! > > Here is the data put into Kafka : > > ``` > > > +

Re: Spark StructuredStreaming - watermark not working as expected

2023-03-15 Thread karan alang
Hi Mich, this doesn't seem to be working for me .. the watermark seems to be getting ignored ! Here is the data put into Kafka : ``` +---++ |value |key | +---

Re: Spark StructuredStreaming - watermark not working as expected

2023-03-12 Thread Mich Talebzadeh
OK ts is the timestamp right? This is a similar code that works out the average temperature with time frame of 5 minutes. Note the comments and catch error with try: try: # construct a streaming dataframe streamingDataFrame that subscribes to topic temperature s

Re: Spark StructuredStreaming - watermark not working as expected

2023-03-10 Thread karan alang
Hi Mich - Here is the output of the ldf.printSchema() & ldf.show() commands. ldf.printSchema() root |-- applianceName: string (nullable = true) |-- timeslot: long (nullable = true) |-- customer: string (nullable = true) |-- window: struct (nullable = false) ||-- start: timestamp (nullabl

Re: Spark StructuredStreaming - watermark not working as expected

2023-03-10 Thread Mich Talebzadeh
Just looking at the code in here ldf = ldf.groupBy("applianceName", "timeslot", "customer", window(col("ts"), "15 minutes")) \ .agg({'sentOctets':"sum", 'recvdOctets':"sum"}) \ .withColumnRenamed('sum(sentOctets)', 'sentOctets') \ .wi