Mich`s idea is quite fine, if i was you, i will follow his idea...

Alonso Isidoro Roman
[image: https://]about.me/alonso.isidoro.roman
<https://about.me/alonso.isidoro.roman?promo=email_sig&utm_source=email_sig&utm_medium=email_sig&utm_campaign=external_links>

2016-05-31 6:37 GMT+02:00 Mich Talebzadeh <mich.talebza...@gmail.com>:

> how are you getting your data from the database. Are you using JDBC.
>
> Can you actually call the database first (assuming the same data, put it
> in temp table in Spark and cache it for the duration of windows length and
> use the data from the cached table?
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 31 May 2016 at 04:19, Malcolm Lockyer <malcolm.lock...@hapara.com>
> wrote:
>
>> On Tue, May 31, 2016 at 3:14 PM, Darren Govoni <dar...@ontrenet.com>
>> wrote:
>> > Well that could be the problem. A SQL database is essential a big
>> > synchronizer. If you have a lot of spark tasks all bottlenecking on a
>> single
>> > database socket (is the database clustered or colocated with spark
>> workers?)
>> > then you will have blocked threads on the database server.
>>
>> Totally agree this could be a big killer to scaling up, we are
>> planning to migrate. But in the meantime we are seeing such big issues
>> with test data of only a few records (1, 2, 1024 etc.) produced to
>> Kafka. Currently the database is NOT busy (CPU, memory and IO usage
>> from the DB is tiny).
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>

Reply via email to