Hi, I would like to remove certain lines from a log files. I had some sed/awk scripts for this, but now, I want to use python with its re module for this task.
Actually, I have two different log files. The first file looks like: ... 'some text' ... ITER I----------------- GLOBAL ABSOLUTE RESIDUAL -----------------I I------------ FIELD VALUES AT MONITORING LOCATION ----------I NO UMOM VMOM WMOM MASS T EN DISS ENTH U V W P TE ED T 1 9.70E-02 8.61E-02 9.85E-02 1.00E+00 1.61E+01 7.65E+04 0.00E+00 1.04E-01-8.61E-04 3.49E-02 1.38E-03 7.51E-05 1.63E-05 2.00E+01 2 3.71E-02 3.07E-02 3.57E-02 1.00E+00 3.58E-01 6.55E-01 0.00E+00 1.08E-01-1.96E-03 4.98E-02 7.11E-04 1.70E-04 4.52E-05 2.00E+01 3 2.64E-02 1.99E-02 2.40E-02 1.00E+00 1.85E-01 3.75E-01 0.00E+00 1.17E-01-3.27E-03 6.07E-02 4.02E-04 4.15E-04 1.38E-04 2.00E+01 4 2.18E-02 1.52E-02 1.92E-02 1.00E+00 1.21E-01 2.53E-01 0.00E+00 1.23E-01-4.85E-03 6.77E-02 1.96E-05 9.01E-04 3.88E-04 2.00E+01 5 1.91E-02 1.27E-02 1.70E-02 1.00E+00 8.99E-02 1.82E-01 0.00E+00 1.42E-01-6.61E-03 7.65E-02 1.78E-04 1.70E-03 9.36E-04 2.00E+01 ... ... ... 2997 3.77E-04 2.89E-04 3.05E-04 2.71E-02 5.66E-04 6.28E-04 0.00E+00 -3.02E-01 3.56E-02-7.97E-02-7.11E-02 4.08E-02 1.86E-01 2.00E+01 2998 3.77E-04 2.89E-04 3.05E-04 2.71E-02 5.65E-04 6.26E-04 0.00E+00 -3.02E-01 3.63E-02-8.01E-02-7.10E-02 4.02E-02 1.83E-01 2.00E+01 2999 3.76E-04 2.89E-04 3.05E-04 2.70E-02 5.64E-04 6.26E-04 0.00E+00 -3.02E-01 3.69E-02-8.04E-02-7.10E-02 3.96E-02 1.81E-01 2.00E+01 3000 3.78E-04 2.91E-04 3.07E-04 2.74E-02 5.64E-04 6.26E-04 0.00E+00 -3.01E-01 3.75E-02-8.07E-02-7.09E-02 3.91E-02 1.78E-01 2.00E+01 &&&&&& -------------------------------------------------------------- ---- .... 'some text' .... I actually want to extract the lines with the numbers, write them to a file and finally use gnuplot for plotting them. A nicer and more python way would be to extract those numbers, write them into an array according to their column and plot those using the gnuplot or matplotlib module :-) Unfortunately, I am pretty new to the re module and tried the following so far: import re pat = re.compile('\ \ \ NO.*?&&&&&&', re.DOTALL) print re.sub(pat, '', open('log_star_orig').read()) but this works just the other way around, which means that the original log file is printed without the number part. So the next step would be to delete the part from the first line to '\ \ \ \ NO' and the part from '&&&&&&' to the end, but I do not know how to address the first and last line!? Would be nice, if you can give me a hint and especially interesting would it be, when you have an idea, how I can put those columns in arrays, so I can plot them right away! A more difficult log file looks like: ====================================================================== OUTER LOOP ITERATION = 1 CPU SECONDS = 2.40E+01 ---------------------------------------------------------------------- | Equation | Rate | RMS Res | Max Res | Linear Solution | +----------------------+------+---------+---------+------------------+ | U-Mom | 0.00 | 1.0E-02 | 5.0E-01 | 4.9E-03 OK| | V-Mom | 0.00 | 2.4E-14 | 5.6E-13 | 3.8E+09 ok| | W-Mom | 0.00 | 2.5E-14 | 8.2E-13 | 8.3E+09 ok| | P-Mass | 0.00 | 1.1E-02 | 3.4E-01 | 8.9 2.7E-02 OK| +----------------------+------+---------+---------+------------------+ | K-TurbKE | 0.00 | 1.8E+00 | 1.8E+00 | 5.8 2.2E-08 OK| | E-Diss.K | 0.00 | 1.9E+00 | 2.0E+00 | 12.4 2.2E-08 OK| +----------------------+------+---------+---------+------------------+ ====================================================================== OUTER LOOP ITERATION = 2 CPU SECONDS = 8.57E+01 ---------------------------------------------------------------------- | Equation | Rate | RMS Res | Max Res | Linear Solution | +----------------------+------+---------+---------+------------------+ | U-Mom | 1.44 | 1.5E-02 | 5.3E-01 | 9.6E-03 OK| | V-Mom |99.99 | 1.1E-03 | 6.2E-02 | 5.7E-02 OK| | W-Mom |99.99 | 1.9E-03 | 6.0E-02 | 5.9E-02 OK| | P-Mass | 0.27 | 3.0E-03 | 2.0E-01 | 8.9 7.9E-02 OK| +----------------------+------+---------+---------+------------------+ | K-TurbKE | 0.03 | 5.4E-02 | 4.4E-01 | 5.8 2.9E-08 OK| | E-Diss.K | 0.05 | 8.9E-02 | 9.3E-01 | 12.4 2.6E-08 OK| +----------------------+------+---------+---------+------------------+ ... ... ... ====================================================================== OUTER LOOP ITERATION = 416 CPU SECONDS = 2.28E+04 ---------------------------------------------------------------------- | Equation | Rate | RMS Res | Max Res | Linear Solution | +----------------------+------+---------+---------+------------------+ | U-Mom | 0.96 | 1.8E-04 | 5.8E-03 | 1.8E-02 OK| | V-Mom | 0.98 | 3.6E-05 | 1.5E-03 | 4.4E-02 OK| | W-Mom | 0.99 | 4.5E-05 | 2.1E-03 | 4.3E-02 OK| | P-Mass | 0.96 | 8.3E-06 | 3.0E-04 | 12.9 4.0E-02 OK| +----------------------+------+---------+---------+------------------+ | K-TurbKE | 0.98 | 1.5E-03 | 3.0E-02 | 5.7 2.5E-06 OK| | E-Diss.K | 0.97 | 4.2E-04 | 1.1E-02 | 12.3 3.9E-08 OK| +----------------------+------+---------+---------+------------------+ With my sed/awk/grep/gnuplot script I would extract the values in the 'U-Mom' row using grep and print a certain column (e.g. 'Max Res') to a file and print it with gnuplot. Maybe I have to remove those '|' using sed before... Do you have an idea, how I can do this completely using python? Thanks for your help! Greetings! Fabian -- http://mail.python.org/mailman/listinfo/python-list