On Oct 31, 12:48 pm, elca <high...@gmail.com> wrote: > Hello, > i have some text document to parse. > sample text is such like follow > in this document, i would like to extract such like > SUBJECT = 'NETHERLANDS MUSIC EPA' > CONTENT = 'Michael Buble performs in Amsterdam Canadian singer Michael Buble > performs during a concert in Amsterdam, The Netherlands, 30 October 2009. > Buble released his new album entitled 'Crazy Love'. EPA/OLAF KRAAK ' > > if anyone help me,much appreciate > > " > NETHERLANDS MUSIC EPA | 36 before > Michael Buble performs in Amsterdam Canadian singer Michael Buble performs > during a concert in Amsterdam, The Netherlands, 30 October 2009. Buble > released his new album entitled 'Crazy Love'. EPA/OLAF KRAAK > "
You really don't need regular expressions for this: >>> import os >>> eol = os.linesep >>> text = ''' ... NETHERLANDS MUSIC EPA | 36 before ... Michael Buble performs in Amsterdam Canadian singer Michael Buble performs ... during a concert in Amsterdam, The Netherlands, 30 October 2009. Buble ... released his new album entitled 'Crazy Love'. EPA/OLAF KRAAK ... ''' >>> text = text.strip() # remove eol markers >>> subject = text.split(' | ')[0] >>> content = ' '.join(text.split(eol)[1:]) >>> subject 'NETHERLANDS MUSIC EPA' >>> content "Michael Buble performs in Amsterdam Canadian singer Michael Buble performs during a concert in Amsterdam, The Netherlands, 30 October 2009. Buble released his new album entitled 'Crazy Love'. EPA/OLAF KRAAK" -- http://mail.python.org/mailman/listinfo/python-list