I am trying to write a program in Python that will edit .txt log files that contain regression output from R. Any thoughts or suggestions would be greatly appreciated.
To get an idea of what I am trying to do, note that I include fixed effects in the R regressions, resulting in hundreds of extra lines per regression which I am not interested in right now. Basically, I want to save a shortened version of the .txt files in which the blocks of fixed effects coefficients are replaced by a line that says includes fixed effects for whatever variable it is. All the lines that are to be deleted start with the same six characters -- 'factor(xyz)' where xyz is the variable name -- so my idea is to have Python copy each line to a new file if the first six characters do not match 'factor('. That part I at least know how to approach. However, I am not sure how to approach adding the line that says, "includes fixed effects for xyz." The problem I am having is how to approach the following: 1. In the resulting file, I will be skipping blocks of lines, say anywhere from 10 to 500 or so, and inserting one line -- i.e., whether it inserts the line needs to depend on whether it's the first line or one of the remaining 499 lines. 2. the xyz variable name is different lengths depending on what variable it is. For example, one block might be 'state' and another block might be 'yr'. Maybe I can use the fact that the var name starts after the first '(' and ends at the first ')' in the line? I think I can use the re module for this? Any suggestions on any aspect of this, but especially the latter part, would be greatly appreciated. Thank you.
-- http://mail.python.org/mailman/listinfo/python-list