On Feb 13, 6:53 am, mathieu <[EMAIL PROTECTED]> wrote: > I do not understand what is wrong with the following regex expression. > I clearly mark that the separator in between group 3 and group 4 > should contain at least 2 white space, but group 3 is actually reading > 3 +4 > > Thanks > -Mathieu > > import re > > line = " (0021,xx0A) Siemens: Thorax/Multix FD Lab Settings > Auto Window Width SL 1 " > patt = re.compile("^\s*\(([0-9A-Z]+),([0-9A-Zx]+)\)\s+([A-Za-z0-9./:_ > -]+)\s\s+([A-Za-z0-9 ()._,/#>-]+)\s+([A-Z][A-Z]_?O?W?)\s+([0-9n-]+)\s* > $") <snip>
I love the smell of regex'es in the morning! For more legible posting (and general maintainability), try breaking up your quoted strings like this: line = \ " (0021,xx0A) Siemens: Thorax/Multix FD Lab Settings " \ "Auto Window Width SL 1 " patt = re.compile( "^\s*" "\(" "([0-9A-Z]+)," "([0-9A-Zx]+)" "\)\s+" "([A-Za-z0-9./:_ -]+)\s\s+" "([A-Za-z0-9 ()._,/#>-]+)\s+" "([A-Z][A-Z]_?O?W?)\s+" "([0-9n-]+)\s*$") Of course, the problem is that you have a greedy match in the part of the regex that is supposed to stop between "Settings" and "Auto". Change patt to: patt = re.compile( "^\s*" "\(" "([0-9A-Z]+)," "([0-9A-Zx]+)" "\)\s+" "([A-Za-z0-9./:_ -]+?)\s\s+" "([A-Za-z0-9 ()._,/#>-]+)\s+" "([A-Z][A-Z]_?O?W?)\s+" "([0-9n-]+)\s*$") or if you prefer: patt = re.compile("^\s*\(([0-9A-Z]+),([0-9A-Zx]+)\)\s+([A-Za-z0-9./:_ -]+?)\s\s+([A-Za-z0-9 ()._,/#>-]+)\s+([A-Z][A-Z]_?O?W?)\s+([0-9n-]+)\s* $") It looks like you wrote this regex to process this specific input string - it has a fragile feel to it, as if you will have to go back and tweak it to handle other data that might come along, such as (xx42,xx0A) Honeywell: Inverse Flitznoid (Kelvin) 80 SL 1 Just out of curiosity, I wondered what a pyparsing version of this would look like. See below: from pyparsing import Word,hexnums,delimitedList,printables,\ White,Regex,nums line = \ " (0021,xx0A) Siemens: Thorax/Multix FD Lab Settings " \ "Auto Window Width SL 1 " # define fields hexint = Word(hexnums+"x") text = delimitedList(Word(printables), delim=White(" ",exact=1), combine=True) type_label = Regex("[A-Z][A-Z]_?O?W?") int_label = Word(nums+"n-") # define line structure - give each field a name line_defn = "(" + hexint("x") + "," + hexint("y") + ")" + \ text("desc") + text("window") + type_label("type") + \ int_label("int") line_parts = line_defn.parseString(line) print line_parts.dump() print line_parts.desc Prints: ['(', '0021', ',', 'xx0A', ')', 'Siemens: Thorax/Multix FD Lab Settings', 'Auto Window Width', 'SL', '1'] - desc: Siemens: Thorax/Multix FD Lab Settings - int: 1 - type: SL - window: Auto Window Width - x: 0021 - y: xx0A Siemens: Thorax/Multix FD Lab Settings I was just guessing on the field names, but you can see where they are defined and change them to the appropriate values. -- Paul -- http://mail.python.org/mailman/listinfo/python-list