Alessandro wrote: > Problema con le RE.... > Ho questa stringa "3 HOURS, 22 MINUTES, and 28 SECONDS" e la devo > 'dividere' nelle sue tre parti "3 HOURS", "22 MINUTES", "28 SECONDS". > La cosa mi viene molto con le RE...(inutile la premessa che sono molto > alle prime armi con RE e Python) > Qesito perchè se eseguo questo codice > > >>>>regex=re.compile("[0-9]+ (HOUR|MINUTE|SECOND)") > >>>>print regex.findall("22 MINUTE, 3 HOUR, AND 28 SECOND") > ottengo come output: > > >>>> ['MINUTE', 'HOUR', 'SECOND'] > > e non come mi aspettavo: > > >>>> ['3 MINUTE', '22 HOUR', '28 SECOND'] > > Saluti e grazie mille... > Alessandro > Would probably be slightly easier had you written it in english, but basically the issue is the matching group.
A match group is defined by the parenthesis in the regular expression, e.g. your match group is "(HOUR|MINUTE|SECOND)", which means that only that will be returned by a findall. You need to include the number as well, and you can use a non-grouping match for the time (with (?: ) instead of () ) to prevent dirtying your matched groups. >>> pattern = re.compile(r"([0-9]+ (?:HOUR|MINUTE|SECOND))") Other improvements: * \d is a shortcut for "any digit" and is therefore equivalent to [0-9] yet slightly clearer. * You may use the re.I (or re.IGNORECASE) to match both lower and uppercase times * You can easily handle an optional "s" Improved regex: >>> pattern = re.compile(r"(\d+ (?:hour|minute|second)s?)", re.I) >>> pattern.findall("3 HOURS 22 MINUTES 28 SECONDS") ['3 HOURS', '22 MINUTES', '28 SECONDS'] >>> pattern.findall("1 HOUR 22 MINUTES 28 SECONDS") ['1 HOUR', '22 MINUTES', '28 SECONDS'] If you want to learn more about regular expressions, I suggest you to browse and read http://regular-expressions.info/ it's a good source of informations, and use the Kodos software which is a quite good Python regex debugger. -- http://mail.python.org/mailman/listinfo/python-list