On Mar 5, 7:00 am, Duncan Booth <duncan.bo...@invalid.invalid> wrote: > Jean-Michel Pichavant <jeanmic...@sequans.com> wrote: > > And tell me how not using regexp will ensure the /etc/hosts processing > > is correct ? The non regexp solutions provided in this thread did not > > handled what you rightfully pointed out about host list and commented > > lines. > > It won't make is automatically correct, but I'd guess that written without > being so dependent on regexes might have made someone point out those > deficiencies sooner. The point being that casual readers of the code won't > take the time to decode the regex, they'll glance over it and assume it > does something or other sensible. > > If I was writing that code, I'd read each line, strip off comments and > leading whitespace (so you can use re.match instead of re.search), split on > whitespace and take all but the first field. I might check that the field > I'm ignoring it something like a numeric ip address, but if I did want to > do then I'd include range checking for valid octets so still no regex. > > The whole of that I'd wrap in a generator so what you get back is a > sequence of host names. > > However that's just me. I'm not averse to regular expressions, I've written > some real mammoths from time to time, but I do avoid them when there are > simpler clearer alternatives. > > > And FYI, the OP pattern does match '192.168.200.1 (foo123)' > > ... > > Ok that's totally unfair :D You're right I made a mistake. Still the > > comment is absolutely required (provided it's correct). > > Yes, the comment would have been good had it been correct. I'd also go for > a named group as that provides additional context within the regex. > > Also if there are several similar regular expressions in the code, or if > they get too complex I'd build them up in parts. e.g. > > OCTET = r'(?:\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])' > ADDRESS = (OCTET + r'\.') * 3 + OCTET > HOSTNAME = r'[-a-zA-Z0-9]+(?:\.[-a-zA-Z0-9]+)*' > # could use \S+ but my Linux manual says > # alphanumeric, dash and dots only > ... and so on ... > > which provides another way of documenting the intentions of the regex. > > BTW, I'm not advocating that here, the above patterns would be overkill, > but in more complex situations thats what I'd do. > > -- > Duncan Boothhttp://kupuguy.blogspot.com
All good comments here. The takeaway for my lazy style of regexes (which makes it harder for non-regex fiends to read, regardless of the language) is that there are ways to make regexes much more readable to the untrained eye. Duncan, I like your method of defining sections of the regex outside the regex itself, even if it's a one time use. -- http://mail.python.org/mailman/listinfo/python-list