On Saturday, September 28, 2013 4:54:35 PM UTC, Tim Chase wrote: > On 2013-09-28 09:11, cerr wrote: > > > I have a list of sentences and a list of words. Every full word > > > that appears within sentence shall be extended by <WORD> i.e. "I > > > drink in the house." Would become "I <drink> in the <house>." (and > > > not "I <d<rink> in the <house>.") > > > > This is a good place to reach for regular expressions. It comes with > > a "ensure there is a word-boundary here" token, so you can do > > something like the code at the (way) bottom of this email. I've > > pushed it off the bottom in the event you want to try and use regexps > > on your own first. Or if this is homework, at least make you work a > > *little* :-) > > > > > Also, is there a way to make it faster? > > > > The code below should do the processing in roughly O(n) time as it > > only makes one pass through the data and does O(1) lookups into your > > set of nouns. I included code in the regexp to roughly find > > contractions and hyphenated words. Your original code grows slower > > as your list of nouns grows bigger and also suffers from > > multiple-replacement issues (if you have the noun-list of ["drink", > > "rink"], you'll get results that you don't likely want. > > > > My code hasn't considered case differences, but you should be able to > > normalize both the list of nouns and the word you're testing in the > > "modify()" function so that it would find "Drink" as well as "drink" > > > > Also, note that some words serve both as nouns and other parts of > > speech, e.g. "It's kind of you to house me for the weekend and drink > > tea with me." > > > > -tkc > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > import re > > > > r = re.compile(r""" > > \b # assert a word boundary > > \w+ # 1+ word characters > > (?: # a group > > [-'] # a dash or apostrophe > > \w+ # followed by 1+ word characters > > )? # make the group optional (0 or 1 instances) > > \b # assert a word boundary here > > """, re.VERBOSE) > > > > nouns = set([ > > "drink", > > "house", > > ]) > > > > def modify(matchobj): > > word = matchobj.group(0) > > if word in nouns: > > return "<%s>" % word > > else: > > return word > > > > print r.sub(modify, "I drink in the house")
Great, only I don't have the re module on my system.... :( -- https://mail.python.org/mailman/listinfo/python-list