Could you kindly share if you've been able to run the streaming job over a
long period of time? I did something very similar and the executors seem to
run out of memory (how fast depends on how much data/memory they get). Just
curious what your experience is
On Fri, Sep 26, 2014 at 12:31 AM, madde
Yup it's all in the gist:
https://gist.github.com/maddenpj/5032c76aeb330371a6e6
Lines 6-9 deal with setting up the driver specifically. This sets the driver
up on each partition which keeps the connection pool around per record.
--
View this message in context:
http://apache-spark-user-list.10
Thanks for the update.. I'm interested in writing the results to MySQL as
well, can you share some light or code sample on how you setup the
driver/connection pool/etc.?
On Thu, Sep 25, 2014 at 4:00 PM, maddenpj wrote:
> Update for posterity, so once again I solved the problem shortly after
> po
Update for posterity, so once again I solved the problem shortly after
posting to the mailing list. So updateStateByKey uses the default
partitioner, which in my case seemed like it was set to one.
Changing my call from .updateStateByKey[Long](updateFn) ->
.updateStateByKey[Long](updateFn, numPart