On Sat, Nov 04, 2006 at 01:03:14PM -0900, Ken Irving wrote: > On Fri, Nov 03, 2006 at 09:56:12PM -0500, Douglas Tutty wrote: > > On Fri, Nov 03, 2006 at 08:27:42PM +0000, michael wrote: > > > I've been trying to do this with 'awk' but am hitting probs (not used > > > awk for ages!) so all offers welcome! > > > > > > Given a multiple line file, IN, that contains the word Processor > > > (possibly split over 2 lines) I wish to output the field immediately > > > preceeding Processor. > > > > > > eg for > > > > > > junk info 18 Pro > > > cessor > > > > > > I wish to get the field '18' > > > > I've read the replies telling you about awk and it reminds me why I > > never use awk or regular expressions. My mind doesn't do cryptic. I > > either do fortran77 or python. For this I would use python so you can > > lay it out step by step logically. > > > > Since it appears that newlines aren't significant, I would get rid of > > them. > > > > IN = open('IN') > > instring = IN.read() > > IN.close() > > > > I would remove all newlines so it was one huge line. > > > > onelinestring = instring.replace('\n', ' ') > > del instring > > > > Split the string into a list of words > > > > inlist = onelinestring.split() > > del onelinestring > > > > Iterate through the list looking for 'processor' > > > > oldword = ' ' > > for newword in inlist > > if word.lower == 'processor' > > print oldword # the previous word > > oldword = newword > > > > del inlist > > > > So I did it in 8 lines instead of one, but in 10 years I'll still know > > what those 8 lines do. All the del lines do is free memory as soon as > > possible as there is no need to keep multiple versions of the file > > around. Internally, I don't know how awk and regular expressions handle > > this. > > Is this pseudo-code or does it actually run? I had to add some crypic > noise, I mean ':' characters, in a couple of places, change "word" to > "newword", and it still didn't seem to work. The interesting part of > the otherwise mundane problem was that the pattern to match is perhaps > on two different lines. I don't see how this is addressed in the > proffered solution.
Ok, a bit of python hacking later... The same technique shown previously (in awk) can be used: #!/usr/bin/python olderword = ' ' oldword = ' ' for newword in open('IN').read().split(): if newword.lower() == 'processor': print oldword # the previous word else: # try combining new and old word... if oldword.lower() + newword.lower() == 'processor': print olderword # the previouser word... olderword = oldword oldword = newword There doesn't seem to be any need for storing/deleting variables for handling the input, nor for replacing newlines with spaces. Ken -- Ken Irving, [EMAIL PROTECTED] -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]