The initial state is stored in a Parquet file which is effectively a static Dataset. I seen there is a Jira open for full joins on streaming plus static Datasets for Structured Streaming (SPARK-20002 <https://issues.apache.org/jira/browse/SPARK-20002>). So once that Jira is completed it would be possible.
For mapGroupsWithState it would be great if you could provide an initialState Dataset with Key -> State initial values. On 5 May 2017 at 23:49, Tathagata Das <[email protected]> wrote: > Can you explain how your initial state is stored? is it a file, or its in > a database? > If its in a database, then when initialize the GroupState, you can fetch > it from the database. > > On Fri, May 5, 2017 at 7:35 AM, Patrick McGloin <[email protected] > > wrote: > >> Hi all, >> >> With Spark Structured Streaming, is there a possibility to set an >> "initial state" for a query? >> >> Using a join between a streaming Dataset and a static Dataset does not >> support full joins. >> >> Using mapGroupsWithState to create a GroupState does not support an >> initialState (as the Spark Streaming StateSpec did). >> >> Are there any plans to add support for initial states? Or is there >> already a way to do so? >> >> Best regards, >> Patrick >> > >
