Re: Structured Streaming Dataframe Size

2019-08-29 Thread Tathagata Das
Responses inline. On Wed, Aug 28, 2019 at 8:42 AM Nick Dawes wrote: > Thank you, TD. Couple of follow up questions please. > > 1) "It only keeps around the minimal intermediate state data" > > How do you define "minimal" here? Is there a configuration property to > control the time or size of St

Re: Structured Streaming Dataframe Size

2019-08-28 Thread Nick Dawes
Thank you, TD. Couple of follow up questions please. 1) "It only keeps around the minimal intermediate state data" How do you define "minimal" here? Is there a configuration property to control the time or size of Streaming Dataframe? 2) I'm not writing anything out to any database or S3. My req

Re: Structured Streaming Dataframe Size

2019-08-27 Thread Tathagata Das
https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#basic-concepts *Note that Structured Streaming does not materialize the entire table*. It > reads the latest available data from the streaming data source, processes > it incrementally to update the result, and then d