1. Just loop like this.
def startQuery(): Streaming Query = { // Define the dataframes and start the query } // call this on main thread while (notShutdown) { val query = startQuery() query.awaitTermination(refreshIntervalMs) query.stop() // refresh static data } 2. Yes, stream-stream joins in 2.3.0, soon to be released. RC3 is available if you want to test it right now - https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc3-bin/. On Wed, Feb 14, 2018 at 3:34 AM, Appu K <kut...@gmail.com> wrote: > TD, > > Thanks a lot for the quick reply :) > > > Did I understand it right that in the main thread, to wait for the > termination of the context I'll not be able to use > outStream.awaitTermination() - [ since i'll be closing in inside another > thread ] > > What would be a good approach to keep the main app long running if I’ve to > restart queries? > > Should i just wait for 2.3 where i'll be able to join two structured > streams ( if the release is just a few weeks away ) > > Appreciate all the help! > > thanks > App > > > > On 14 February 2018 at 4:41:52 PM, Tathagata Das ( > tathagata.das1...@gmail.com) wrote: > > Let me fix my mistake :) > What I suggested in that earlier thread does not work. The streaming query > that joins a streaming dataset with a batch view, does not correctly pick > up when the view is updated. It works only when you restart the query. That > is, > - stop the query > - recreate the dataframes, > - start the query on the new dataframe using the same checkpoint location > as the previous query > > Note that you dont need to restart the whole process/cluster/application, > just restart the query in the same process/cluster/application. This should > be very fast (within a few seconds). So, unless you have latency SLAs of 1 > second, you can periodically restart the query without restarting the > process. > > Apologies for my misdirections in that earlier thread. Hope this helps. > > TD > > On Wed, Feb 14, 2018 at 2:57 AM, Appu K <kut...@gmail.com> wrote: > >> More specifically, >> >> Quoting TD from the previous thread >> "Any streaming query that joins a streaming dataframe with the view will >> automatically start using the most updated data as soon as the view is >> updated” >> >> Wondering if I’m doing something wrong in https://gist.github.com/anony >> mous/90dac8efadca3a69571e619943ddb2f6 >> >> My streaming dataframe is not using the updated data, even though the >> view is updated! >> >> Thank you >> >> >> On 14 February 2018 at 2:54:48 PM, Appu K (kut...@gmail.com) wrote: >> >> Hi, >> >> I had followed the instructions from the thread https://mail-archives.a >> pache.org/mod_mbox/spark-user/201704.mbox/%3CD1315D33-41CD- >> 4ba3-8b77-0879f3669...@qvantel.com%3E while trying to reload a static >> data frame periodically that gets joined to a structured streaming query. >> >> However, the streaming query results does not reflect the data from the >> refreshed static data frame. >> >> Code is here https://gist.github.com/anonymous/90dac8efadca3a69571e6 >> 19943ddb2f6 >> >> I’m using spark 2.2.1 . Any pointers would be highly helpful >> >> Thanks a lot >> >> Appu >> >> >