Re: Calculate average from Spark stream

2021-05-21 Thread Mich Talebzadeh
OK where is your watermark created? That is the one that works out the average temperature! # construct a streaming dataframe streamingDataFrame that subscribes to topic temperature streamingDataFrame = self.spark \ .readStream \ .format("kaf

Re: Calculate average from Spark stream

2021-05-18 Thread Mich Talebzadeh
something like below: root |-- window: struct (nullable = false) ||-- start: timestamp (nullable = true) ||-- end: timestamp (nullable = true) |-- avg(temperature): double (nullable = true) import pyspark.sql.function

Re: Calculate average from Spark stream

2021-05-18 Thread Mich Talebzadeh
Ok let me provide some suggestions here. ResultM is a data frame and if you do ResultM.printShema() You will get the struct column called window with two columns namely start and end plus the average temperature. Just try to confirm that now HTH, Much On Tue, 18 May 2021 at 14:15, Giuseppe Ri

Re: Calculate average from Spark stream

2021-05-17 Thread Mich Talebzadeh
Hi Giuseppe , How have you defined your resultM above in qK? Cheers view my Linkedin profile *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property whic

Re: Calculate average from Spark stream

2021-05-17 Thread Mich Talebzadeh
Hi Giuseppe, Your error state --> Required attribute 'value' not found First can you read your streaming data OK? Here in my stream in data format in json. I have three columns in json format example: {"rowkey":"f0577406-a7d3-4c52-9998-63835ea72133", "timestamp":"2021-05-17T15:17:27", "tempera

Re: Calculate average from Spark stream

2021-05-15 Thread Mich Talebzadeh
Hi, In answer to your question I did some tests using broadly your approach. With regard to your questions: "but it does not work well because it does not give a temperature average as you can see in the attached pic. Why is the average not calculated on temperature? How can I view data in each w

Re: Calculate average from Spark stream

2021-05-12 Thread Mich Talebzadeh
Have you managed to sort out this problem and the reason this solution is not working! Bottom line, your temperature data comes in streams every two seconds and you want an average of temperature for the past 300 seconds worth of data, in other words your windows length is 300 seconds? You also w

Re: Calculate average from Spark stream

2021-05-10 Thread Lalwani, Jayesh
You don’t need to “launch batches” every 5 minutes. You can launch batches every 2 seconds, and aggregate on window for 5 minutes. Spark will read data from topic every 2 seconds, and keep the data in memory for 5 minutes. You need to make few decisions 1. DO you want a tumbling window or a

Re: Calculate average from Spark stream

2021-05-10 Thread Mich Talebzadeh
Hi Giuseppe, Just looked over your PySpark code. You are doing Spark Structured Streaming (SSS) Your kafka topic sends messages every two seconds and regardless you want to enrich the data every 5 minutes. In other words weait for 5 minutes to build the batch. You can either run wait for 5 minut