Re: pyspark worker concurrency

2016-02-08 Thread Renyi Xiong
never mind, I think pyspark is already doing async socket read / write, but on scala side in PythonRDD.scala On Sat, Feb 6, 2016 at 6:27 PM, Renyi Xiong wrote: > Hi, > > is it a good idea to have 2 threads in pyspark worker? - main thread > responsible for receive and send data over socket whi

pyspark worker concurrency

2016-02-06 Thread Renyi Xiong
Hi, is it a good idea to have 2 threads in pyspark worker? - main thread responsible for receive and send data over socket while the other thread is calling user functions to process data? since CPU is idle (?) during network I/O, this should improve concurrency quite a bit. can expert answer t