First of all, thank you all for your answers. I received python mail-list in a daily digest, so it is not easy for me to quote your mail separately.
I will try to explain my situation to my best, but English is not my native language, I don't know whether I can make it clear at last. Every SECTION starts with 2 special lines; these 2 lines is special because they have some same characters (the length is not const for different section) at the beginning; these same characters is called the KEY for this section. For every 2 neighbor sections, they have different KEYs. After these 2 special lines, some paragraph is followed. Paragraph does not have any KEYs. So, a section = 2 special lines with KEYs at the beginning + some paragraph without KEYs However there maybe some paragraph before the first section, which I do not need and want to drop it I need a method to split the whole text into SECTIONs and to know all the KEYs I have tried to solve this problem via re module, but failed. Maybe I can make you understand me clearly by showing the regular expression object reobj = re.compile(r"(?P<bookname>[^\r\n]*?)[^\r\n]*?\r\n(?P=bookname)[^\r\n]*?\r\n.*?", re.DOTALL) which can get the first 2 lines of a section, but fail to get the rest of a section which does not have any KEYs at the begin. The hard part for me is to express "paragraph does not have KEYs". Even I can get the first 2 line, I think regular expression is expensive for my text. That is all. I hope get some more suggestions. Thanks. [demo text starts] a line we do not need I am section axax I am section bbb (and here goes many other text)... let's continue to let's continue, yeah .....(and here goes many other text)... I am using python I am using perl .....(and here goes many other text)... Programming is hard Programming is easy How do you thing? I do’t know [demo text ends] the above text should be splited to a LIST with 4 items, and I also need to know the KEY for LIST is ['I am section ', 'let's continue', 'I am using ', ' Programming is ']: lst=[ '''a line we do not need I am section axax I am section bbb (and here goes many other text)... ''', '''let's continue to let's continue, yeah .....(and here goes many other text)... ''', '''I am using python I am using perl .....(and here goes many other text)... ''', '''Programming is hard Programming is easy How do you thing? I do’t know''' ] -- https://mail.python.org/mailman/listinfo/python-list