On Tue, Sep 15, 2015 at 1:46 PM, Renyi Xiong wrote:
> Can anybody help understand why pyspark streaming uses py4j callback to
> execute python code while pyspark batch uses worker.py?
There are two kind of callback in pyspark streaming:
1) one operate on RDDs, it take an RDD and return an new RDD
Can anybody help understand why pyspark streaming uses py4j callback to
execute python code while pyspark batch uses worker.py?
regarding pyspark streaming, is py4j callback only used for
DStream, worker.py still used for RDD?
thanks,
Renyi.