On 08Oct2013 10:59, Skip Montanaro <s...@pobox.com> wrote: | > Aiui apache log format uses space as delimiter, encapsulates strings in | > '"' characters, and uses '-' as an empty field. | | Specifying the field delimiter as a space, you might be able to use | the csv module to read these. I haven't done any Apache log file work | since long before the csv module was available, but it just might | work.
You can definitely do this. I pull things out of apache log files using awk in exactly this fashion. It does rely on each of the "real" fields having a fixed number of "words" in it. You just stick the fields back together again. And also in Python. I've got a merge-apache-logs script to read multiple logs, presumed in time order, and produce a single output stream for passing to log analysis tools: https://bitbucket.org/cameron_simpson/css/src/tip/bin/merge-apache-logs It is a bit of a hack, but useful. It has an "aptime" function to pull and parse the time field from the line which starts like this: def aptime(logline, zones, defaultZone): ''' Compute a datetime object from the supplied Apache log line. `defaultZone` is the timezone to use if it cannot be deduced. ''' fields = logline.split() if len(fields) < 5: ##warning("bad log line: %s", logline) return None dt = None tzinfo = None # try for desired "[DD/Mon/YYYY:HH:MM:SS +hhmm]" format humantime, tzinfo = fields[3], fields[4] if len(humantime) == 21 \ and humantime.startswith('[') \ and tzinfo.endswith(']'): try: dt = datetime.strptime(humantime, "[%d/%b/%Y:%H:%M:%S") except ValueError, e: dt = None if dt is None: tzinfo = None else: tzinfo = tzinfo[:-1] and proceeeds otherwise (we have a few different log formats in play, alas). So regexpas are not your only choice here, and possibly not even the best choice. Cheers, -- Cameron Simpson <c...@zip.com.au> This is not a bug. It's just the way it works, and makes perfect sense. - Tom Christiansen <tchr...@jhereg.perl.com> I like that line. I hope my boss falls for it. - Chaim Frenkel <cha...@cris.com> -- https://mail.python.org/mailman/listinfo/python-list