Hi, >From your reference I can see that you are running in local mode with two cores. But that is not standalone.
Can you please clarify whether you start master and slaves processes. Those are for standalone mode. sbin/start-master.sh sbin/start-slaves.sh HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On 25 July 2016 at 18:21, on <schueler_1...@web.de> wrote: > Dear all, > > I am running spark on one host ("local[2]") doing calculations like this > on a socket stream. > mainStream = socketStream.filter(lambda msg: > msg['header'].startswith('test')).map(lambda x: (x['host'], x) ) > s1 = mainStream.updateStateByKey(updateFirst).map(lambda x: (1, x) ) > s2 = mainStream.updateStateByKey(updateSecond, > initialRDD=initialMachineStates).map(lambda x: (2, x) ) > out.join(bla2).foreachRDD(no_out) > > I evaluated each calculations allone has a processing time about 400ms > but processing time of the code above is over 3 sec on average. > > I know there are a lot of parameters unknown but does anybody has hints > how to tune this code / system? I already changed a lot of parameters, > such as #executors, #cores and so on. > > Thanks in advance and best regards, > on > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >