Your elaboration on what problem you are actually trying to solve gave me some additional insights into your question. It looks like you are writing a Python-HTML templating system, by embedding Python within HTML using <python>...</python> tags.
As many may have already guessed, I worked up a pyparsing treatment of your problem. As part of the implementation, I reinterpreted your transformations slightly. You said: >>>I want to replace the <python> with " ", </python> >>>with "\n" and every thing that's not between the two >>>python tags must begin with "\nprint \"\"\"" and >>>end with "\"\"\"\n" If this were an HTML page with <python> tags, it might look like: <some HTML> <python> x = 1 </python> <some more HTML> The corresponding CGI python code would then read: print """<some HTML>\n""" x = 1 print """<some more HTML>\n""" So we can reinterpret your transformation as: 1. From start of file to first <python> tag, enclose in print """<leading stuff>\n""" 2. From <python> tag to </python tag, print contents 3. From </python> tag to next <python> tag, enclose in print """<stuff between tags>\n""" 4. From last </python> tag to end of file, enclose in print """<ending stuff>\n""" Or more formally: <beginning of file> -> 'print r"""' <python> -> '"""\n' <\python> -> 'print r"""' <end of file> -> '"""\n' Now that we have this defined, we can consider adding some standard imports to the <beginning of file> transformation, such as "import sys", etc. Here is a working implementation. The grammar itself is only about 10 lines of code, mostly in defining the replacement transforms. The last 18 lines are the test case itself, printing the transformed string, and then eval'ing the transformed string. ======================== # Take HTML that has <python> </python> tags interspersed, with python code # between the <python> tags. Convert to running python cgi program. # replace <python> with r'"""\n' and </python> with r'\nprint """' # also put 'print """\ \n' at the beginning and '"""\n' at the end from pyparsing import * class OnlyOnce(object): def __init__(self, methodCall): self.callable = methodCall self.called = False def __call__(self,s,l,t): if not self.called: self.called = True return self.callable(s,l,t) raise ParseException(s,l,"") stringStartText = """import sys print "Content-Type: text/html\\n" print r\"\"\"""" stringEndText = '"""\n' startPythonText = '"""\n' endPythonText = '\nprint r"""\n' # define grammar pythonStart = CaselessLiteral("<python>") pythonEnd = CaselessLiteral("</python>") sStart = StringStart() sEnd = StringEnd() sStart.setParseAction( OnlyOnce( replaceWith(stringStartText) ) ) sEnd.setParseAction( replaceWith(stringEndText) ) pythonStart.setParseAction( replaceWith(startPythonText) ) pythonEnd.setParseAction( replaceWith(endPythonText) ) xform = sStart | sEnd | pythonStart | pythonEnd # run test case htmlWithPython = r"""<HTML> <HEAD> <TITLE>Sample Page Created from Python</TITLE> </HEAD> <BODY> <H1>Sample Page Created from Python</H1> <python> for i in range(10): print "This is line %d<br>" % i </python> </BODY> </HTML> """ generatedPythonCode = xform.transformString( htmlWithPython ) print generatedPythonCode print exec(generatedPythonCode) ======================== Here is the output: import sys print "Content-Type: text/html\n" print r"""<HTML> <HEAD> <TITLE>Sample Page Created from Python</TITLE> </HEAD> <BODY> <H1>Sample Page Created from Python</H1> """ for i in range(10): print "This is line %d<br>" % i print r""" </BODY> </HTML> """ Content-Type: text/html <HTML> <HEAD> <TITLE>Sample Page Created from Python</TITLE> </HEAD> <BODY> <H1>Sample Page Created from Python</H1> This is line 0<br> This is line 1<br> This is line 2<br> This is line 3<br> This is line 4<br> This is line 5<br> This is line 6<br> This is line 7<br> This is line 8<br> This is line 9<br> </BODY> </HTML> ======================== This exercise was interesting to me in that it uncovered some unexpected behavior in pyparsing when matching on positional tokens (in this case StringStart and StringEnd). I learned that: 1. Since StringStart does not advance the parsing position in the string, it is necessary to ensure that the parse action get run only once, and then raise a ParseException on subsequent calls. The little class OnlyOnce takes care of this (I will probably fold OnlyOnce into the next point release of pyparsing). 2. StringEnd is not well matched during scanString or transformString if there is no trailing whitespace at the end of the input. Even a trailing \n is sufficient. My first example of testdata ended with the closing </HTML> tag, with no carriage return, and scanString/transformString failed to match. If I added a newline to close the </HTML> tag, then scanString could find the StringEnd. This is not a terrible workaround, but it's another loose end to tie up in the next release. Enjoy! -- Paul -- http://mail.python.org/mailman/listinfo/python-list