well lets say i have about a thounsand files to be proccessed .. i need to extract text out of them , whatever file type it is (i use Linux "strings") command .
i want to do in multi processed way , which works on multi-core pcs too. this is my current implementation : import subprocess,shlex def __forcedParsing(fname): cmd = 'strings "%s"' % (fname) #print cmd args= shlex.split(cmd) try: sp = subprocess.Popen( args, shell = False, stdout = subprocess.PIPE, stderr = subprocess.PIPE ) out, err = sp.communicate() except OSError: print "Error no %s Message %s" % (OSError.errno,OSError.message) pass if sp.returncode== 0: #print "Processed %s" %fname return out def parseDocs(): rows_to_parse = [i for i in range( 0,len(SESSION.all_docs))] row_ids = [x[0] for x in SESSION.all_docs ] res=[] for rowID in rows_to_parse: file_id, fname, ftype, dir = SESSION.all_docs[int( rowID ) ] fp = os.path.join( dir, fname ) res.append(__forcedParsing(fp)) well the problem is i need output from subprocess so i have to read using sp.communicate(). i need that to be multiprocessed (via forking? poll?) so here are my thoughs : 1) without using fork() , could I do multiple ajax posts by iterating the huge list of files at client side to server , each processes will be multi-threaded because of Rocket right? But may this suffer performace issue on client side? 2) Forking Current implementation, and read output via polling? subprocess.poll() any ideas?