On Jul 31, 9:07 pm, [EMAIL PROTECTED] wrote: > I am using regular expressions to search a string (always full > sentences, maybe more than one sentence) for common abbreviations and > remove the periods. I need to break the string into different > sentences but split('.') doesn't solve the whole problem because of > possible periods in the middle of a sentence. > > So I have... > > ---------------- > > import re > > middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.') > > # this will find abbreviations like e.g. or i.e. in the middle of a > sentence. > # then I want to remove the periods. > > ---------------- > > I want to keep the ie or eg but just take out the periods. Any > ideas? Of course newString = middle_abbr.sub('',txt) where txt is the > string will take out the entire abbreviation with the alphanumeric > characters included.
It's recommended that you should use a raw strings for regular expressions. Capture the letters using parentheses: middle_abbr = re.compile(r'([A-Za-z0-9])\.([A-Za-z0-9])\.') and replace what was found with what was captured: newString = middle_abbr.sub(r'\1\2', txt) HTH -- http://mail.python.org/mailman/listinfo/python-list