Hello, As pyspark internals wiki said, pyspark worker use pipe to communicate, not socket. https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals
I have checked the pyspark/worker.py code: if __name__ == '__main__': # Read a local port to connect to from stdin java_port = int(sys.stdin.readline()) sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.connect(("127.0.0.1", java_port)) sock_file = sock.makefile("rwb", 65536) main(sock_file, sock_file) it actually uses socket not pipe, I am wondering that is there anything I missed? why pyspark worker use socket not pipe? for performance reason? -- Best & Regards Cyanny LIANG email: lgrcya...@gmail.com