Hi list, I'm trying to use regular expressions to help me quickly extract the contents of messages that my application will receive. I have worked out most of the regex but the last section of the message has me stumped. This is mostly because I want to pull the content out into regex groups that I can easily access later. I have a regex to extract the key/value pairs but it ends up with only the contents of the last key/value pair encountered.
An example of the section of the message that is troubling me appears like this: { option=value foo=bar another=42 option=7 } So it's basically a bunch of lines. Every line is terminated with a '\n' character. The number of key/value fields changes depending on the particular message. Also notice that there are two 'option' keys. This is allowable and I need to cater for it. A couple of example messages are: xpl-stat\n{\nhop=1\nsource=vendor-device.instance\ntarget=*\n} \nhbeat.basic\n{\ninterval=10\n}\n xpl-stat\n{\nhop=1\nsource=vendor-device.instance\ntarget=vendor- device.instance\n}\nconfig.list\n{\nreconf=newconf\noption=interval \noption=group[16]\noption=filter[16]\n}\n As all messages follow the same pattern I'm hoping to develop a generic regex, instead of one for each message kind - because there are many, that can pull a message from a received packet. The regex I came up with looks like this: # This should match any xPL message GROUP_MESSAGE_TYPE = 'message_type' GROUP_HOP = 'hop' GROUP_SOURCE = 'source' GROUP_TARGET = 'target' GROUP_SRC_VENDOR_ID = 'source_vendor_id' GROUP_SRC_DEVICE_ID = 'source_device_id' GROUP_SRC_INSTANCE_ID = 'source_instance_id' GROUP_TGT_VENDOR_ID = 'target_vendor_id' GROUP_TGT_DEVICE_ID = 'target_device_id' GROUP_TGT_INSTANCE_ID = 'target_instance_id' GROUP_IDENTIFIER_TYPE = 'identifier_type' GROUP_SCHEMA = 'schema' GROUP_SCHEMA_CLASS = 'schema_class' GROUP_SCHEMA_TYPE = 'schema_type' GROUP_OPTION_KEY = 'key' GROUP_OPTION_VALUE = 'value' XplMessageGroupsRe = r'''(?P<%s>xpl-(cmnd|stat|trig)) \n # message type \ {\n # hop=(?P<%s>[1-9]{1}) \n # hop count source=(?P<%s>(?P<%s>[a-z0-9]{1,8})-(?P<%s>[a-z0-9]{1,8})\.(?P< %s>[a-z0-9]{1,16}))\n # source identifier target=(?P<%s>(\*|(?P<%s>[a-z0-9]{1,8})-(?P<%s>[a-z0-9]{1,8})\.(?P< %s>[a-z0-9]{1,16})))\n # target identifier \} \n # (?P<%s>(?P<%s>[a-z0-9]{1,8})\.(?P<%s>[a-z0-9]{1,8}))\n # schema \ {\n # (?:(?P<%s>[a-z0-9\-]{1,16})=(?P<%s>[\x20-\x7E]{0,128})\n){1,64} # key/value pairs \}\n''' % (GROUP_MESSAGE_TYPE, GROUP_HOP, GROUP_SOURCE, GROUP_SRC_VENDOR_ID, GROUP_SRC_DEVICE_ID, GROUP_SRC_INSTANCE_ID, GROUP_TARGET, GROUP_TGT_VENDOR_ID, GROUP_TGT_DEVICE_ID, GROUP_TGT_INSTANCE_ID, GROUP_SCHEMA, GROUP_SCHEMA_CLASS, GROUP_SCHEMA_TYPE, GROUP_OPTION_KEY, GROUP_OPTION_VALUE) XplMessageGroups = re.compile(XplMessageGroupsRe, re.VERBOSE | re.DOTALL) If I pass the second example message through this regex the 'key' group ends up containing 'option' and the 'value' group ends up containing 'filter[16]' which are the last key/value pairs in that message. So the problem I have lies in the key/value regex extraction section. It handles multiple occurrences of the pattern and writes the content into the single key/value group hence I can't extract and access all fields. Is there some other way to do this which allows me to store all the key/value pairs into the regex match object for later retrieval? Perhaps using the standard unnamed number groups? Thanks, Chris -- http://mail.python.org/mailman/listinfo/python-list