On Tuesday 07 November 2006 02:55, Andrew Sackville-West wrote: > On Tue, Nov 07, 2006 at 01:00:34AM +1100, John O'Hagan wrote:
[...] > > You're right; but the OP, Michael, gave the above scenario as his > > problem. If your situation were the case, though, I guess we could use tr > > -d '-' to get rid of all the hyphens first as well. > > the problem there is what if the desired result word includes a > hyphen, then you'll have modified your result. I think you should go > ahead and tr -d '\n' | tr ' ' '\n' | and then grep for a regex of > Processor that allows for hyphens. you could limit it to the usual > hyphen locations Pro-cess-or or is it Pro-ces-sor? > > here's another problem. target word is at end of line with processor > at beginning of next line. There is only a newline between them and > so the result becomes > > test > word > target-wordProcessor > other > junk > > you're grep will return 'word' instead of 'target-word'. You'd have to > use a n old find-replace trick > > tr '\n' ' ' | tr -s ' ' '\n' | grep -B1 'Pro-*cess-*or' | grep -v > 'Pro-*cess-*or\--' > > > this replaces newlines with spaces and then replaces all single or > multiple occurences of spaces with newlines. this allows that edge > case above to come through properly. Then I think the grep is right > to match zero or more hyphens in processor. > I tried this, and found that replacing the newlines with spaces stops the grep from working because it puts spaces in the middle of any occurrences of "Processor", but I see what you mean about the edge case. I think this version takes care of it, plus it is hyphen-agnostic: tr -d '\n' <IN | sed s/P-*r-*o-*c-*e-*s-*s-*o-*r/' Processor'/g | tr -s ' ' '\n' | grep -B1 'Processor' | grep -v 'Processor\|--' removing newlines, replacing all cases of (non-)hyphenated "Processor" with a space followed by "Processor", then doing the grep. And here's a Python version using the re module to deal with the hyphens ( the edge case takes care of itself here): import re for i in re.split('P-?r-?o-?c-?e-?s-?s-?o-?r', open('IN').read().replace('\n', ''))[0:-1]: print i.split()[-1] Have we done this to death yet? :) Regards, John -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]