Re: Structured Streaming Dataframe Size

Tathagata Das Tue, 27 Aug 2019 15:42:51 -0700

https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#basic-concepts


*Note that Structured Streaming does not materialize the entire table*. It
> reads the latest available data from the streaming data source, processes
> it incrementally to update the result, and then discards the source data.
> It only keeps around the minimal intermediate *state* data as required to
> update the result (e.g. intermediate counts in the earlier example).
>


On Tue, Aug 27, 2019 at 1:21 PM Nick Dawes <nickdawe...@gmail.com> wrote:

> I have a quick newbie question.
>
> Spark Structured Streaming creates an unbounded dataframe that keeps
> appending rows to it.
>
> So what's the max size of data it can hold? What if the size becomes
> bigger than the JVM? Will it spill to disk? I'm using S3 as storage. So
> will it write temp data on S3 or on local file system of the cluster?
>
> Nick
>

Re: Structured Streaming Dataframe Size

Reply via email to