On 29/08/2010 15:22, naugiedoggie wrote:
Hello,
I'm having a problem with using a function as the replacement in
re.sub().
Here is the function:
def normalize(s) :
return
urllib.quote(string.capwords(urllib.unquote(s.group('provider'))))
This normalises the provider and returns only that, and none of the
remainder of the string.
I think you might want this:
def normalize(s):
return s[ : s.start('provider')] +
urllib.quote(string.capwords(urllib.unquote(s.group('provider')))) +
s[s.start('provider') : ]
It returns the part before the provider, followed by the normalised
provider, and then the part after the provider.
The purpose of this function is to proper-case the words contained in
a URL query string parameter value. I'm massaging data in web log
files.
In case it matters, the regex pattern looks like this:
provider_pattern = r'(?P<search>Search_Provider)=(?P<provider>[^&]+)'
The call looks like this:
<code>
re.sub(matcher,normalize,line)
</code>
Where line is the log line entry.
What I get back is first the entire line with the normalization of the
parameter value, but missing the parameter; then appended to that
string is the entire line again, with the query parameter back in
place pointing to the normalized string.
<code>
fileReader = open(log,'r')
lines = fileReader.readlines()
for line in lines:
if line.find('Search_Type') != -1 and line.find('Search_Provider') !=
-1 :
These can be replaced by:
if 'Search_Type' in line and 'Search_Provider' in line:
re.sub(provider_matcher,normalize,line)
re.sub is returning the result, which you're throwing away!
line = re.sub(provider_matcher,normalize,line)
print line,'\n'
</code>
The output of the print is like this:
<code>
'log-entry parameter=value&normalized-string¶meter=value\n
log-entry parameter=value¶meter=normalized-string¶meter=value'
</code>
The goal is to massage the specified entries in the log files and
write the entire log back into a new file. The new file has to be
exactly the same as the old one, with the exception of the entries
I've altered with my function.
No doubt I'm doing something trivially wrong, but I've tried to
reproduce the structure as defined in the documentation.
--
http://mail.python.org/mailman/listinfo/python-list