Mich`s idea is quite fine, if i was you, i will follow his idea... Alonso Isidoro Roman [image: https://]about.me/alonso.isidoro.roman <https://about.me/alonso.isidoro.roman?promo=email_sig&utm_source=email_sig&utm_medium=email_sig&utm_campaign=external_links>
2016-05-31 6:37 GMT+02:00 Mich Talebzadeh <mich.talebza...@gmail.com>: > how are you getting your data from the database. Are you using JDBC. > > Can you actually call the database first (assuming the same data, put it > in temp table in Spark and cache it for the duration of windows length and > use the data from the cached table? > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > > On 31 May 2016 at 04:19, Malcolm Lockyer <malcolm.lock...@hapara.com> > wrote: > >> On Tue, May 31, 2016 at 3:14 PM, Darren Govoni <dar...@ontrenet.com> >> wrote: >> > Well that could be the problem. A SQL database is essential a big >> > synchronizer. If you have a lot of spark tasks all bottlenecking on a >> single >> > database socket (is the database clustered or colocated with spark >> workers?) >> > then you will have blocked threads on the database server. >> >> Totally agree this could be a big killer to scaling up, we are >> planning to migrate. But in the meantime we are seeing such big issues >> with test data of only a few records (1, 2, 1024 etc.) produced to >> Kafka. Currently the database is NOT busy (CPU, memory and IO usage >> from the DB is tiny). >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >