Derek Martin wrote:
On Thu, Aug 21, 2008 at 02:58:24PM -0700, sab wrote:
I have been working on a python script to parse a continuously growing
log file on a UNIX server.

If you weren't aware, there are already a plethora of tools which do
this...  You might save yourself the trouble by just using one of
those.  Try searching for something like "parse log file" on google or
freshmeat.net or whatever...

The input is the standard in, piped in from the log file.  The
application works well for the most part, but the problem is when
attempting to continuously pipe information into the application via
the tail -f command.  The command line looks something like this:

tail -f <logfile> | grep <search string> | python parse.py

The pipe puts STDIN/STDOUT into "fully buffered" mode, which results
in the behavior you're seeing.  You can set the buffering mode of
those files in your program, but unfortunately tail and grep are not
your program...  You might get this to work by setting stdin to
non-blocking I/O in your Python program, but I don't think it will be
that easy...

You can get around this in a couple of ways.  One is to call tail and
grep from within your program, using something like os.popen()...
Then set the blocking mode on the resulting files.  You'll have to
feed the output of one to the input of the other, then read the output
of grep and parse that.  Yucky.  That method isn't very efficient,
since Python can do everything that tail and grep are doing for you...
So I'd suggest you read the file directly in your python program, and
use Python's regex parsing functionality to do what you're doing with
grep.
As for how to actually do what tail does, I'd suggest looking at the
source code for tail to see how it does what it does.
But, if I were you, I'd just download something like swatch, and be
done with it. :)



------------------------------------------------------------------------

--
http://mail.python.org/mailman/listinfo/python-list

================================
I have to agree with Derek about using Python as the control here. Pipe or otherwise redirect incoming data to Python. If the incoming is buffered then the program terminates only by force. (Deleted from memory or system shutdown or crash)

The python:  print >>file, str           see Python's lib.pdf
acts like incoming | tee -a file in the sense of double output. One to a file and one to standard out. Str can be a .read() on stdin. As long as it is a string it don't care how it got there.

Depending on choice (per Unix):
incoming | tee -a logfile | program.py
incoming | program.py (copy all to (log)file) | programsub1.py
  with all parsing in the .py's

The advantage is python can control keeping the buffers and thus the programs open and running, whether or not data is in the pipe at the moment. This way the logfile gets a full data set and is not further disturbed. No trying to determine where last record read is located.
                    OR
Last time I looked, the syslog section was NOT disallowed the use of named pipes (which default to first in, first out (FIFO)). This allows pgm.py to read named_pipe, append all read to log and parse each line as desired, sleep for a time when empty and go again. Once more, sequence maintained. No digging to find last tested input.



Hope this helps.

Steve
[EMAIL PROTECTED]

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to