Madhusudhanan Chandrasekaran wrote: > Hi: > > This question is not directed "entirely" at python only. But since > I want to know how to do it in python, I am posting here. > > > I am constructing a huge matrix (m x n), whose columns n are stored in > smaller files. Once I read m such files, my matrix is complete. I > want to pass this matrix as an input to another script of mine (I > just have the binary.) Currently, the script reads a file (which is > nothing but the matrix) and processes it. Is there any way of doing > this in memory, without writing the matrix onto the disk? > > Since I have to repeat my experimentation for multiple iterations, it > becomes expensive to write the matrix onto the disk. > > Thanks in advance. Help appreciated. > > -Madhu
Basically, you're asking about Inter Process Communication (IPC), for which Python provides several interfaces to mechanisms provided by the operating system (whatever that may be). Here's a couple of commonly used methods: Redirected I/O Have a look at the popen functions in the os module, or better still the subprocess module (which is a higher level interface to the same functionality). Specifically, the "Replacing the shell pipe line" example in the subprocess module's documentation should be interesting: output=`dmesg | grep hda` ==> p1 = Popen(["dmesg"], stdout=PIPE) p2 = Popen(["grep", "hda"], stdin=p1.stdout, stdout=PIPE) output = p2.communicate()[0] Here, the stdout of the "dmesg" process has been redirected to the stdin of the "grep" process. You could do something similar with your two scripts: e.g., the first script simply writes the content of the matrix in some format to stdout (e.g. print, sys.stdout.write), while the second script reads the content of the matrix from stdin (e.g. raw_input, sys.stdin.read). Here's some brutally simplistic scripts that demonstrate the method: in.py ===== #!/bin/env python # # I read integers from stdin until I encounter 0 import sys while True: i = int(sys.stdin.readline()) print "Read %d from stdin" % i if i == 0: break out.py ====== #!/bin/env python # # I write some numbers to stdout for i in [1, 2, 3, 4, 5, 0]: print i run.py ====== #!/bin/env python # # I run out.py and in.py with a pipe between them, capture the # output of in.py and print it from subprocess import Popen, PIPE process1 = Popen(["./out.py"], stdout=PIPE) process2 = Popen(["./in.py"], stdin=process1.stdout, stdout=PIPE) output = process2.communicate()[0] print output Sockets Another form of IPC uses sockets to communicate between two processes (see the socket module or one of the higher level modules like SocketServer). Hence, the second process would listen on a port (presumably on the localhost interface, although there's no reason it couldn't listen on a LAN interface for example), and the first process connects to that port and sends the matrix data across it to the second process. Summary Given that your second script currently reads a file containing the complete matrix (if I understand your post correctly), it's probably easiest for you to use the Redirected I/O method (as it's very similar to reading a file, although there are some differences, and sometimes one must be careful about closing pipe ends to avoid deadlocks). However, the sockets method has the advantage that you can easily move one of the processes onto a different machine. There are other methods of IPC (for example, shared memory: see the mmap module) however the two mentioned above are available on most platforms whereas others may be specific to a given platform, or have platform specific subtleties (for example, mmap is only available on Windows and UNIX, and has a slightly different constructor on each). HTH, Dave. -- -- http://mail.python.org/mailman/listinfo/python-list