Re: Spark structured streaming: Is it possible to periodically refresh static data frame?

Georg Heiler Thu, 20 Apr 2017 21:22:06 -0700

Unfortunately I think this currently might require the old api.
Hemanth Gudela <hemanth.gud...@qvantel.com> schrieb am Fr. 21. Apr. 2017 um
05:58:


> Idea #2 probably suits my needs better, because
>
> -          Streaming query does not have a source database connector yet
>
> -          My source database table is big, so in-memory table could be
> huge for driver to handle.
>
>
>
> Thanks for cool ideas, TD!
>
>
>
> Regards,
>
> Hemanth
>
>
>
> *From: *Tathagata Das <tathagata.das1...@gmail.com>
> *Date: *Friday, 21 April 2017 at 0.03
> *To: *Hemanth Gudela <hemanth.gud...@qvantel.com>
> *Cc: *Georg Heiler <georg.kf.hei...@gmail.com>, "user@spark.apache.org" <
> user@spark.apache.org>
>
>
> *Subject: *Re: Spark structured streaming: Is it possible to periodically
> refresh static data frame?
>
>
>
> Here are couple of ideas.
>
> 1. You can set up a Structured Streaming query to update in-memory table.
>
> Look at the memory sink in the programming guide -
> http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#output-sinks
>
> So you can query the latest table using a specified table name, and also
> join that table with another stream. However, note that this in-memory
> table is maintained in the driver, and so you have be careful about the
> size of the table.
>
>
>
> 2. If you cannot define a streaming query in the slow moving due to
> unavailability of connector for your streaming data source, then you can
> always define a batch Dataframe and register it as view, and then run a
> background then periodically creates a new Dataframe with updated data and
> re-registers it as a view with the same name. Any streaming query that
> joins a streaming dataframe with the view will automatically start using
> the most updated data as soon as the view is updated.
>
>
>
> Hope this helps.
>
>
>
>
>
> On Thu, Apr 20, 2017 at 1:30 PM, Hemanth Gudela <
> hemanth.gud...@qvantel.com> wrote:
>
> Thanks Georg for your reply.
>
> But I’m not sure if I fully understood your answer.
>
>
>
> If you meant to join two streams (one reading Kafka, and another reading
> database table), then I think it’s not possible, because
>
> 1.       According to documentation
> <http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#data-sources>,
> Structured streaming does not support database as a streaming source
>
> 2.       Joining between two streams is not possible yet.
>
>
>
> Regards,
>
> Hemanth
>
>
>
> *From: *Georg Heiler <georg.kf.hei...@gmail.com>
> *Date: *Thursday, 20 April 2017 at 23.11
> *To: *Hemanth Gudela <hemanth.gud...@qvantel.com>, "user@spark.apache.org"
> <user@spark.apache.org>
> *Subject: *Re: Spark structured streaming: Is it possible to periodically
> refresh static data frame?
>
>
>
> What about treating the static data as a (slow) stream as well?
>
>
>
> Hemanth Gudela <hemanth.gud...@qvantel.com> schrieb am Do., 20. Apr. 2017
> um 22:09 Uhr:
>
> Hello,
>
>
>
> I am working on a use case where there is a need to join streaming data
> frame with a static data frame.
>
> The streaming data frame continuously gets data from Kafka topics, whereas
> static data frame fetches data from a database table.
>
>
>
> However, as the underlying database table is getting updated often, I must
> somehow manage to refresh my static data frame periodically to get the
> latest information from underlying database table.
>
>
>
> My questions:
>
> 1.       Is it possible to periodically refresh static data frame?
>
> 2.       If refreshing static data frame is not possible, is there a
> mechanism to automatically stop & restarting spark structured streaming
> job, so that every time the job restarts, the static data frame gets
> updated with latest information from underlying database table.
>
> 3.       If 1) and 2) are not possible, please suggest alternatives to
> achieve my requirement described above.
>
>
>
> Thanks,
>
> Hemanth
>
>
>

Re: Spark structured streaming: Is it possible to periodically refresh static data frame?

Reply via email to