well

lets say i have about a thounsand files to be proccessed  .. i need to
extract text out of them , whatever file type it is (i use Linux
"strings") command .

i want to do in multi processed way , which works on multi-core pcs too.

this is my current implementation :


import subprocess,shlex

def __forcedParsing(fname):
        cmd = 'strings "%s"' % (fname)
        #print cmd
        args= shlex.split(cmd)
        try:
                sp = subprocess.Popen( args, shell = False, stdout =
subprocess.PIPE, stderr = subprocess.PIPE )
                out, err = sp.communicate()
        except OSError:
                print "Error no %s  Message %s" % 
(OSError.errno,OSError.message)
                pass

        if sp.returncode== 0:
                #print "Processed %s" %fname
                return out


def parseDocs():
        rows_to_parse = [i for i in range( 0,len(SESSION.all_docs))]
        row_ids = [x[0] for x in SESSION.all_docs  ]
        res=[]
        for rowID in rows_to_parse:

                file_id, fname, ftype, dir  = SESSION.all_docs[int( rowID ) ]
                fp = os.path.join( dir, fname )
                res.append(__forcedParsing(fp))


well the problem is i need output from subprocess so i have to read
using sp.communicate(). i need that to be multiprocessed (via forking?
poll?)

so here are my thoughs :

1) without using fork() ,  could I  do multiple ajax posts by
iterating the huge list of files at client side to server   , each
processes will be multi-threaded because of Rocket right? But may this
suffer performace issue on client side?

2) Forking Current implementation, and read output via polling?
subprocess.poll()

any ideas?

Reply via email to