On Jul 13, 8:14 pm, MRAB <[EMAIL PROTECTED]> wrote: > On Jul 14, 12:05 am, Chris <[EMAIL PROTECTED]> wrote:> I'm trying to delimit > sentences in a block of text by defining the > > end-of-sentence marker as a period followed by a space followed by an > > uppercase letter or end-of-string. > > > I'd imagine the regex for that would look something like: > > [^(?:[A-Z]|$)]\.\s+(?=[A-Z]|$) > > > However, Python keeps giving me an "unbalanced parenthesis" error for > > the [^] part. If this isn't valid regex syntax, how else would I match > > a block of text that doesn't the delimiter pattern? > > What is the [^(?:[A-Z]|$)] part meant to be doing? Is it meant to be > matching everything up to the end of the sentence? > > [...] is a character class, so Python is parsing the character class > as: > > [^(?:[A-Z]|$)] > ^^^^^^^^^^
It was meant to include everything except the end-of-sentence pattern. However, I just realized that I can simply replace it with ".*?" -- http://mail.python.org/mailman/listinfo/python-list