On 20 Dec 2005 08:06:39 -0800, "sicvic" <[EMAIL PROTECTED]> wrote:
>Not homework...not even in school (do any universities even teach >classes using python?). Just not a programmer. Anyways I should >probably be more clear about what I'm trying to do. Ok, not homework. > >Since I cant show the actual output file lets say I had an output file >that looked like this: > >aaaaa bbbbb Person: Jimmy >Current Location: Denver >Next Location: Chicago >---------------------------------------------- >aaaaa bbbbb Person: Sarah >Current Location: San Diego >Next Location: Miami >Next Location: New York >---------------------------------------------- > >Now I want to put (and all recurrences of "Person: Jimmy") > >Person: Jimmy >Current Location: Denver >Next Location: Chicago > >in a file called jimmy.txt > >and the same for Sarah in sarah.txt > >The code I currently have looks something like this: > >import re >import sys > >person_jimmy = open('jimmy.txt', 'w') #creates jimmy.txt >person_sarah = open('sarah.txt', 'w') #creates sarah.txt > >f = open(sys.argv[1]) #opens output file >#loop that goes through all lines and parses specified text >for line in f.readlines(): > if re.search(r'Person: Jimmy', line): > person_jimmy.write(line) > elif re.search(r'Person: Sarah', line): > person_sarah.write(line) > >#closes all files > >person_jimmy.close() >person_sarah.close() >f.close() > >However this only would produces output files that look like this: > >jimmy.txt: > >aaaaa bbbbb Person: Jimmy > >sarah.txt: > >aaaaa bbbbb Person: Sarah > >My question is what else do I need to add (such as an embedded loop >where the if statements are?) so the files look like this > >aaaaa bbbbb Person: Jimmy >Current Location: Denver >Next Location: Chicago > >and > >aaaaa bbbbb Person: Sarah >Current Location: San Diego >Next Location: Miami >Next Location: New York > > >Basically I need to add statements that after finding that line copy >all the lines following it and stopping when it sees >'----------------------------------------------' > >Any help is greatly appreciated. > Ok, I generalized on your theme of extracting file chunks to named files, where the beginning line has the file name. I made '.txt' hardcoded extension. I provided a way to direct the output to a (I guess not necessarily sub) directory Not tested beyond what you see. Tweak to suit. ----< extractfilesegs.py >-------------------------------------------------------- """ Usage: [python] extractfilesegs [source [outdir [startpat [endpat]]]] where source is -tf for test file, a file name, or an open file outdir is a directory prefix that will be joined to output file names startpat is a regular expression with group 1 giving the extracted file name endpat is a regular expression whose match line is excluded and ends the segment """ import re, os def extractFileSegs(linesrc, outdir='extracteddata', start=r'Person:\s+(\w+)', stop='-'*30): rxstart = re.compile(start) rxstop = re.compile(stop) if isinstance(linesrc, basestring): linesrc = open(linesrc) lineit = iter(linesrc) files = [] for line in lineit: match = rxstart.search(line) if not match: continue name = match.group(1) filename = name.lower() + '.txt' filename = os.path.join(outdir, filename) #print 'opening file %r'%filename files.append(filename) fout = open(filename, 'a') # append in case repeats? fout.write(match.group(0)+'\n') # did you want aaa bbb stuff? for data_line in lineit: if rxstop.search(data_line): #print 'closing file %r'%filename fout.close() # don't write line with ending mark fout = None break else: fout.write(data_line) if fout: fout.close() print 'file %r ended with source file EOF, not stop mark'%filename return files def get_testfile(): from StringIO import StringIO return StringIO("""\ ...irrelevant leading stuff ... aaaaa bbbbb Person: Jimmy Current Location: Denver Next Location: Chicago ---------------------------------------------- aaaaa bbbbb Person: Sarah Current Location: San Diego Next Location: Miami Next Location: New York ---------------------------------------------- irrelevant trailing stuff ... with a blank line """) if __name__ == '__main__': import sys args = sys.argv[1:] if not args: raise SystemExit(__doc__) tf = args.pop(0) if tf=='-tf': fin = get_testfile() else: fin = tf if not args: files = extractFileSegs(fin) elif len(args)==1: files = extractFileSegs(fin, args[0]) elif len(args)==2: files = extractFileSegs(fin, args[0], args[1], '^$') # stop on blank line? else: files = extractFileSegs(fin, args[0], '|'.join(args[1:-1]), args[-1]) print '\nFiles created:' for fname in files: print ' "%s"'% fname if tf == '-tf': for fpath in files: print '====< %s >====\n%s============'%(fpath, open(fpath).read()) ---------------------------------------------------------------------------------- Running on your test data: [15:19] C:\pywk\clp>md extracteddata [15:19] C:\pywk\clp>py24 extractfilesegs.py -tf Files created: "extracteddata\jimmy.txt" "extracteddata\sarah.txt" ====< extracteddata\jimmy.txt >==== Person: Jimmy Current Location: Denver Next Location: Chicago ============ ====< extracteddata\sarah.txt >==== Person: Sarah Current Location: San Diego Next Location: Miami Next Location: New York ============ [15:20] C:\pywk\clp>md xd [15:20] C:\pywk\clp>py24 extractfilesegs.py -tf xd (Jimmy) ---- Files created: "xd\jimmy.txt" ====< xd\jimmy.txt >==== Jimmy Current Location: Denver Next Location: Chicago ============ [15:21] C:\pywk\clp>py24 extractfilesegs.py -tf xd "Person: (Sarah)" ---- Files created: "xd\sarah.txt" ====< xd\sarah.txt >==== Person: Sarah Current Location: San Diego Next Location: Miami Next Location: New York ============ [15:22] C:\pywk\clp>py24 extractfilesegs.py -tf xd "^(irrelevant)" Files created: "xd\irrelevant.txt" ====< xd\irrelevant.txt >==== irrelevant trailing stuff ... ============ HTH, NO WARRANTIES ;-) Regards, Bengt Richter -- http://mail.python.org/mailman/listinfo/python-list