Re: pyspark streaming DStream compute

2015-09-15 Thread Davies Liu
On Tue, Sep 15, 2015 at 1:46 PM, Renyi Xiong wrote: > Can anybody help understand why pyspark streaming uses py4j callback to > execute python code while pyspark batch uses worker.py? There are two kind of callback in pyspark streaming: 1) one operate on RDDs, it take an RDD and return an new RDD

pyspark streaming DStream compute

2015-09-15 Thread Renyi Xiong
Can anybody help understand why pyspark streaming uses py4j callback to execute python code while pyspark batch uses worker.py? regarding pyspark streaming, is py4j callback only used for DStream, worker.py still used for RDD? thanks, Renyi.