I don't think a Bayesian classifier is going to be very helpful here, unless you have tens of thousands of examples to feed it, or unless it was specially coded to first break addresses into better tokens for classification (such as alphanumeric strings and numbers).
The series of if host.find(...) lines in is_dynip() is equivalent to a regular expression, but much more expensive to execute because of all the list slicing, and it won't benefit from the re module's speedy native implementation of regular expressions. Try building a host_expr (as per my previous post) in the following way: # suppose dynamic_host_list is a list of all the host strings already known to be # dynamic. . host_patterns = {} # use a dict to guarantee uniqueness. sets would also work. . number_expr = re.compile("\d+") . for dynamic_host in dynamic_host_list: . pattern = '^' + number_expr.sub("\d+", dynamic_host) + '$' . host_patterns[pattern] = True . . host_expr = re.compile('|'.join(host_patterns.keys())) This will catch any hostname that differs only in numbers from any other host you've already classified. For IP addresses, you really just need a mechanism to filter blocks of IP addresses. It might be easiest to first convert them into hex and then make liberal use of [0-f] in regular expressions. -- http://mail.python.org/mailman/listinfo/python-list