On Mon, Nov 06, 2006 at 07:55:02AM -0800, Andrew Sackville-West wrote: > On Tue, Nov 07, 2006 at 01:00:34AM +1100, John O'Hagan wrote: > > On Monday 06 November 2006 18:38, David Jardine wrote: > > > On Mon, Nov 06, 2006 at 11:27:58AM +1100, John O'Hagan wrote: > > > > [...] > > > > > > E.g., if IN contains: > > > > > > > > junk info 18 Pro > > > > > > But what if that line were: > > > > > > junk info 18 Pro- > > > > > > which seems more likely? > > > > > > > [...] > > > > You're right; but the OP, Michael, gave the above scenario as his problem. > > If > > your situation were the case, though, I guess we could use tr -d '-' to get > > rid of all the hyphens first as well. > > the problem there is what if the desired result word includes a > hyphen, then you'll have modified your result. I think you should go > ahead and tr -d '\n' | tr ' ' '\n' | and then grep for a regex of > Processor that allows for hyphens. you could limit it to the usual > hyphen locations Pro-cess-or or is it Pro-ces-sor?
That's a good (not the OP's) example of the hyphenation problem; the solution might look/allow for the correct form, but in fact it might be done incorrectly and so the pattern would fail. I'm not aware of a regex pattern to match a word with some specific token appearing anywhere within it; that was how I read the OP's statement, i.e., that the newline might be anywhere in the word. Here's a simple brute force re for perl that will do the deed: m/(\d+)\s+P[\n-]*r[\n-]*o[\n-]*c[\n-]*e[\n-]*s[\n-]*s[\n-]*o[\n-]*r/msg Ken -- Ken Irving, [EMAIL PROTECTED] -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]