One general advice I can provide is if you wish to run the batch jobs
concurrently to spark streaming jobs, then you should to put then in
different fair scheduling pools, and prioritize the streaming pool, to
minimize the streaming jobs from being impacted by the batch jobs. See
spark docs online about fair scheduling pools.

On Tue, Dec 15, 2015 at 2:10 AM, atbrew <atb...@gmail.com> wrote:

> Hi,
>     I have a periodic retraining of a long running job (a decision tree
> trained on a large amount of historical data) that needs retrained on a
> daily/weekly/long period basis.
>
> These models are used in spark streaming to score incoming data, I would
> like to understand what is best practice for triggering the retrain.
>      > Should the spark batch job live in complete isolation from the
> streaming one?
>      > Should the streaming job some how trigger the running of the long
> running batch job, if so how would you recommend?
>
> Does anyone know or a good blog post or article giving heads up on what
> system design for this might look like?
>
> Thanks a Million,
> Anthony
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Mixing-Long-Run-Periodic-Update-Jobs-With-Streaming-Scoring-tp25705.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to