I'm attempting to reformat an apache log file that was written with a custom output format. I'm attempting to get it to w3c format using a python script. The problem I'm having is the field-to-field matching. In my python code I'm using split with spaces as my delimiter. But it fails when it reaches the user agent because that field itself contains spaces. But that user agent is enclosed with double quotes. So is there a way to split on a certain delimiter but not to split within quoted words.
i.e. a line might look like 2009-09-29 12:00:00 - GET / "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; GTB5; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.0.04506; .NET CLR 3.5.21022)" http://somehost.com 200 1923 1360 31715 - -- http://mail.python.org/mailman/listinfo/python-list