On Mon, Nov 06, 2006 at 07:55:02AM -0800, Andrew Sackville-West wrote:
> On Tue, Nov 07, 2006 at 01:00:34AM +1100, John O'Hagan wrote:
> > On Monday 06 November 2006 18:38, David Jardine wrote:
> > > On Mon, Nov 06, 2006 at 11:27:58AM +1100, John O'Hagan wrote:
> > 
> > [...]
> > 
> > > > E.g., if IN contains:
> > > >
> > > > junk info 18 Pro
> > >
> > > But what if that line were:
> > >
> > > junk info 18 Pro-
> > >
> > > which seems more likely?
> > >
> > 
> > [...]
> > 
> > You're right; but the OP, Michael, gave the above scenario as his problem. 
> > If 
> > your situation were the case, though, I guess we could use tr -d '-' to get 
> > rid of all the hyphens first as well.
> 
> the problem there is what if the desired result word includes a
> hyphen, then you'll have modified your result. I think you should go
> ahead and tr -d '\n' | tr ' ' '\n' | and then grep for a regex of
> Processor that allows for hyphens. you could limit it to the usual
> hyphen locations Pro-cess-or or is it Pro-ces-sor? 

That's a good (not the OP's) example of the hyphenation problem; the
solution might look/allow for the correct form, but in fact it might be
done incorrectly and so the pattern would fail.  I'm not aware of a 
regex pattern to match a word with some specific token appearing anywhere
within it; that was how I read the OP's statement, i.e., that the newline
might be anywhere in the word. 

Here's a simple brute force re for perl that will do the deed:

  m/(\d+)\s+P[\n-]*r[\n-]*o[\n-]*c[\n-]*e[\n-]*s[\n-]*s[\n-]*o[\n-]*r/msg

Ken
-- 
Ken Irving, [EMAIL PROTECTED]


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Reply via email to