On Aug 1, 12:53 pm, dusans <[EMAIL PROTECTED]> wrote: > On Jul 31, 10:07 pm, [EMAIL PROTECTED] wrote: > > > > > > > I am using regular expressions to search a string (always full > > sentences, maybe more than one sentence) for common abbreviations and > > remove the periods. I need to break the string into different > > sentences but split('.') doesn't solve the whole problem because of > > possible periods in the middle of a sentence. > > > So I have... > > > ---------------- > > > import re > > > middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.') > > > # this will find abbreviations like e.g. or i.e. in the middle of a > > sentence. > > # then I want to remove the periods. > > > ---------------- > > > I want to keep the ie or eg but just take out the periods. Any > > ideas? Of course newString = middle_abbr.sub('',txt) where txt is the > > string will take out the entire abbreviation with the alphanumeric > > characters included. > > Its impossible with regex. U could try it with a statistical analysis; > and even this would give u a good split.
"and even this wont* give u a good split." :P -- http://mail.python.org/mailman/listinfo/python-list