On Wed, Nov 4, 2009 at 4:27 AM, Tim Chase <python.l...@tim.thechases.com>wrote:
> kylin wrote: > >> I need to remove the word if it appears in the paragraph twice. could >> some give me some clue or some useful function in the python. >> > > Sounds like homework. To fail your class, use this one: > > >>> p = "one two three four five six seven three four eight" > >>> s = set() > >>> print ' '.join(w for w in p.split() if not (w in s or s.add(w))) > one two three four five six seven eight > > which is absolutely horrible because it mutates the set within the list > comprehension. The passable solution would use a for-loop to iterate over > each word in the paragraph, emitting it if it hadn't already been seen. > Maintain those words in set, so your words know how not to be seen. ("Mr. > Nesbitt, would you please stand up?") > > Can we use inp_paragraph.count(iter_word) to make it simple ? This also assumes your paragraph consists only of words and whitespace. But > since you posted your previous homework-sounding question on stripping out > non-word/whitespace characters, you'll want to look into using a regexp like > "[\w\s]" to clean up the cruft in the paragraph. Neither solution above > preserves non white-space/word characters, for which I'd recommend using a > re.sub() with a callback. Such a callback class might look something like > > >>> class Dedupe: > ... def __init__(self): > ... self.s = set() > ... def __call__(self, m): > ... w = m.group(0) > ... if w in self.s: return '' > ... self.s.add(w) > ... return w > ... > >>> r.sub(Dedupe(), p) > > where I leave the definition of "r" to the student. Also beware of > case-differences for which you might have to normalize. > > You'll also want to use more descriptive variable names than my one-letter > tokens. > > -tkc > > > > > > -- > http://mail.python.org/mailman/listinfo/python-list > -- Yours, S.Selvam
-- http://mail.python.org/mailman/listinfo/python-list