Unfortunately I think this currently might require the old api. Hemanth Gudela <hemanth.gud...@qvantel.com> schrieb am Fr. 21. Apr. 2017 um 05:58:
> Idea #2 probably suits my needs better, because > > - Streaming query does not have a source database connector yet > > - My source database table is big, so in-memory table could be > huge for driver to handle. > > > > Thanks for cool ideas, TD! > > > > Regards, > > Hemanth > > > > *From: *Tathagata Das <tathagata.das1...@gmail.com> > *Date: *Friday, 21 April 2017 at 0.03 > *To: *Hemanth Gudela <hemanth.gud...@qvantel.com> > *Cc: *Georg Heiler <georg.kf.hei...@gmail.com>, "user@spark.apache.org" < > user@spark.apache.org> > > > *Subject: *Re: Spark structured streaming: Is it possible to periodically > refresh static data frame? > > > > Here are couple of ideas. > > 1. You can set up a Structured Streaming query to update in-memory table. > > Look at the memory sink in the programming guide - > http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#output-sinks > > So you can query the latest table using a specified table name, and also > join that table with another stream. However, note that this in-memory > table is maintained in the driver, and so you have be careful about the > size of the table. > > > > 2. If you cannot define a streaming query in the slow moving due to > unavailability of connector for your streaming data source, then you can > always define a batch Dataframe and register it as view, and then run a > background then periodically creates a new Dataframe with updated data and > re-registers it as a view with the same name. Any streaming query that > joins a streaming dataframe with the view will automatically start using > the most updated data as soon as the view is updated. > > > > Hope this helps. > > > > > > On Thu, Apr 20, 2017 at 1:30 PM, Hemanth Gudela < > hemanth.gud...@qvantel.com> wrote: > > Thanks Georg for your reply. > > But I’m not sure if I fully understood your answer. > > > > If you meant to join two streams (one reading Kafka, and another reading > database table), then I think it’s not possible, because > > 1. According to documentation > <http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#data-sources>, > Structured streaming does not support database as a streaming source > > 2. Joining between two streams is not possible yet. > > > > Regards, > > Hemanth > > > > *From: *Georg Heiler <georg.kf.hei...@gmail.com> > *Date: *Thursday, 20 April 2017 at 23.11 > *To: *Hemanth Gudela <hemanth.gud...@qvantel.com>, "user@spark.apache.org" > <user@spark.apache.org> > *Subject: *Re: Spark structured streaming: Is it possible to periodically > refresh static data frame? > > > > What about treating the static data as a (slow) stream as well? > > > > Hemanth Gudela <hemanth.gud...@qvantel.com> schrieb am Do., 20. Apr. 2017 > um 22:09 Uhr: > > Hello, > > > > I am working on a use case where there is a need to join streaming data > frame with a static data frame. > > The streaming data frame continuously gets data from Kafka topics, whereas > static data frame fetches data from a database table. > > > > However, as the underlying database table is getting updated often, I must > somehow manage to refresh my static data frame periodically to get the > latest information from underlying database table. > > > > My questions: > > 1. Is it possible to periodically refresh static data frame? > > 2. If refreshing static data frame is not possible, is there a > mechanism to automatically stop & restarting spark structured streaming > job, so that every time the job restarts, the static data frame gets > updated with latest information from underlying database table. > > 3. If 1) and 2) are not possible, please suggest alternatives to > achieve my requirement described above. > > > > Thanks, > > Hemanth > > >