On Thu, Nov 09, 2006 at 12:52:57AM +1100, John O'Hagan wrote: > > > > tr -d '-\n' <IN | tr '\n' ' ' | tr -s ' ' '\n' | grep -B1 'Processor' > > > > | grep -v 'Processor\|--' > > > > [...] > > Aha! You're right, my lines fail on the edge cases, and also when the target > word is hyphenated. > > Your ingenious approach didn't always work either [1]; but it revealed (to > me)
yup. okay, another one below > that there will be unresolvable ambiguities in the IN file unless: > > EITHER: A) lines are broken arbitrarily without hyphenation, in which case > newlines have no significance, spaces between words must preserved and we can > use: > > #tr -d '\n' < IN | tr ' ' '\n' | grep -B1 Processor | grep -v 'Processor\|--' > > or in Python: > > #for i in open('IN').read().replace('\n', '').split('Processor')[0:-1]: > # print i.split()[-1] > > OR: B) broken words are hyphenated, and unhyphenated newlines are equivalent > to spaces, in which case we could use something like: > > ---------------- > while read i ; do > > if [[ $(echo "$i" | grep \\-\$ ) ]]; then > > i=$( echo "$i" | sed s/-\$//) > echo "$i" > else echo "$i"' ' > fi > > done < IN | tr -d '\n' | tr ' ' '\n' | grep -B1 'Processor' | > grep -v 'Processor\|--' > -------------------------------- > > This removes hyphens at the end of lines or else adds a space, which converts here's a simpler way to do that, I think ;) tr '\n' ' ' | sed 's/- //g' | tr ' ' '\n' | grep -B1 'Processor' | grep -v 'Processor\|--' replace the newlines with spaces. then use sed as it matches better than tr to strip occurences of '- '. The assumption here is that hypens don't appear at the end of words and they only occur because we created them with our tr '\n' ' '. SO I wonder what happened to the OP? Is he just watching waiting for the right solution, or is he long gone? A > > > [1] I tried Andrew's solution above and found that it only always worked on > the unhyphenated case, I think because tr treats its arguments as character > sets, not expresions, so that tr -d '\-\n' (note the escape required for the > hyphen) deletes any hyphens or newlines, not just that combination. yeah, that's what happens when you only think about the problem and don't actually test it. the above was briefly tested... A
signature.asc
Description: Digital signature