A S <aishan0...@gmail.com> writes: > I would like to extract all words within specific keywords in a .txt > file. For the keywords, there is a starting keyword of "PROC SQL;" (I > need this to be case insensitive) and the ending keyword could be > either "RUN;", "quit;" or "QUIT;". This is my sample .txt file. > > Thus far, this is my code: > > with open('lan sample text file1.txt') as file: > text = file.read() > regex = re.compile(r'(PROC SQL;|proc sql;(.*?)RUN;|quit;|QUIT;)') > k = regex.findall(text) > print(k)
Try re.compile(r'(?si)(PROC SQL;.*(?:QUIT|RUN);)') Read up one what (?si) means and what (?:...) means.. You can do the same by passing flags to the compile method. > Output: > > [('quit;', ''), ('quit;', ''), ('PROC SQL;', '')] Your main issue is that | binds weakly. Your whole pattern tries to match any one of just four short sub-patterns: PROC SQL; proc sql;(.*?)RUN; quit; QUIT; -- Ben. -- https://mail.python.org/mailman/listinfo/python-list