In append mode, the aggregation outputs a row only when the watermark has been crossed and the corresponding aggregate is *final*, that is, will not be updated any more. See http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#handling-late-data-and-watermarking
On Mon, Aug 14, 2017 at 4:09 PM, Ashwin Raju <ther...@gmail.com> wrote: > Hi, > > I am running Spark 2.2 and trying out structured streaming. I have the > following code: > > from pyspark.sql import functions as F > > df=frame \ > > .withWatermark("timestamp","1 minute") \ > > .groupby(F.window("timestamp","1 day"),*groupby_cols) \ > > .agg(f.sum('bytes')) > > query = frame.writeStream \ > > .format("console") > > .option("checkpointLocation", '\some\chkpoint') > > .outputMode("complete") > > .start() > > > > query.awaitTermination() > > > > It prints out a bunch of aggregated rows to console. When I run the same > query with outputMode("append") however, the output only has the column > names, no rows. I was originally trying to output to parquet, which only > supports append mode. I was seeing no data in my parquet files, so I > switched to console output to debug, then noticed this issue. Am I > misunderstanding something about how append mode works? > > > Thanks, > > Ashwin > >