https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#basic-concepts
*Note that Structured Streaming does not materialize the entire table*. It > reads the latest available data from the streaming data source, processes > it incrementally to update the result, and then discards the source data. > It only keeps around the minimal intermediate *state* data as required to > update the result (e.g. intermediate counts in the earlier example). > On Tue, Aug 27, 2019 at 1:21 PM Nick Dawes <nickdawe...@gmail.com> wrote: > I have a quick newbie question. > > Spark Structured Streaming creates an unbounded dataframe that keeps > appending rows to it. > > So what's the max size of data it can hold? What if the size becomes > bigger than the JVM? Will it spill to disk? I'm using S3 as storage. So > will it write temp data on S3 or on local file system of the cluster? > > Nick >