Hello,
As pyspark internals wiki said,
pyspark worker use pipe to communicate, not socket.
https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals
I have checked the pyspark/worker.py code:
if __name__ == '__main__':
# Read a local port to connect to from stdin
java_port = int(sys.stdin.readline())
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect(("127.0.0.1", java_port))
sock_file = sock.makefile("rwb", 65536)
main(sock_file, sock_file)
it actually uses socket not pipe, I am wondering that is there anything I
missed?
why pyspark worker use socket not pipe? for performance reason?
--
Best & Regards
Cyanny LIANG
email: [email protected]