I tried the solutions you provided..these are not as robust as i thought would be... may be i should put the problem more clearly...
here it goes.... I have a bunch of documents and each document has a header which is common to all files. I read each file process it and compute the frequency of words in each file. now I want to ignore the header in each file. It is easy if the header is always at the top. but apparently its not. it could be at the bottom as well. So I want a function which goes through the file content and ignores the common header and return the remaining text to compute the frequencies..Also the header is not just one line..it includes licences and all other stuff and may be 50 to 60 lines as well..This "remove_header" has to be much more efficient as the files may be huge. As this is a very small part of the whole problem i dont want this to slow down my entire code... -- http://mail.python.org/mailman/listinfo/python-list