Hello,
As pyspark internals wiki said,
pyspark worker use pipe to communicate, not socket.
https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals

I have checked the pyspark/worker.py code:

if __name__ == '__main__':
    # Read a local port to connect to from stdin
    java_port = int(sys.stdin.readline())
    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    sock.connect(("127.0.0.1", java_port))
    sock_file = sock.makefile("rwb", 65536)
    main(sock_file, sock_file)

it actually uses socket not pipe, I am wondering that is there anything I
missed?
why pyspark worker use socket not pipe? for performance reason?

-- 
Best & Regards
Cyanny LIANG
email: lgrcya...@gmail.com

Reply via email to