how are you getting your data from the database. Are you using JDBC.

Can you actually call the database first (assuming the same data, put it in
temp table in Spark and cache it for the duration of windows length and use
the data from the cached table?

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 31 May 2016 at 04:19, Malcolm Lockyer <malcolm.lock...@hapara.com> wrote:

> On Tue, May 31, 2016 at 3:14 PM, Darren Govoni <dar...@ontrenet.com>
> wrote:
> > Well that could be the problem. A SQL database is essential a big
> > synchronizer. If you have a lot of spark tasks all bottlenecking on a
> single
> > database socket (is the database clustered or colocated with spark
> workers?)
> > then you will have blocked threads on the database server.
>
> Totally agree this could be a big killer to scaling up, we are
> planning to migrate. But in the meantime we are seeing such big issues
> with test data of only a few records (1, 2, 1024 etc.) produced to
> Kafka. Currently the database is NOT busy (CPU, memory and IO usage
> from the DB is tiny).
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to