Hi all, I have recently started assessing structured streaming and ran into a little snag from the beginning.
Basically I wanted to read some data, do some basic aggregation and write the result to file: import org.apache.spark.sql.functions.avg import org.apache.spark.sql.streaming.ProcessingTime val rawRecords = spark.readStream.schema(myschema).parquet("/mytest") val q = rawRecords.withColumn("g",$"id" % 100).groupBy("g").agg(avg($"id")) val res = q.writeStream.outputMode("complete").trigger(ProcessingTime("10 seconds")).format("parquet").option("path", "/test2").option("checkpointLocation", "/mycheckpoint").start The problem is that it tells me that parquet does not support the complete mode (or update for that matter). So how would I do a streaming with aggregation to file? In general, my goal is to have a single (slow) streaming process which would write some profile and then have a second streaming process which would load the current dataframe to be used in join (I would stop the second streaming process and reload the dataframe periodically). Any help would be appreciated. Thanks, Assaf. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/having-trouble-using-structured-streaming-with-file-sink-parquet-tp28760.html Sent from the Apache Spark User List mailing list archive at Nabble.com.