Hi: i'm so newbie in python that i don't get the right idea about regular expressions. This is what i want to do: Extract using python some information and them replace this expresion for others, i use as a base the wikitext and this is what i do:
<code file="parse.py"> paragraphs = """ = Test '''wikitest'''= [[Image:image_link.jpg|rigth|thumbnail|200px|"PREMIER"]] [http://www.google.com.cu] ::''Note: This is just an example to test some regular expressions stuffs.'' The ''wikitext'' is a text format that helps a lot. In concept is a simple [[markup]] [[programming_language|language]]. That helps to make simple create documentations texts. ==Wikitext== Created by Warn as a ... <nowiki>[</nowiki> this is a normal <nowiki>sign]</nowiki> """.split('\n\n') import re wikipatterns = { 'a_nowiki' : re.compile(r"<nowiki>(.\S+)</nowiki>"), # nowiki 'section' : re.compile(r"\=(.*)\="), # section one tags 'sectiontwo' : re.compile(r"\=\=(.*?)\=\="),# section two tags 'wikilink': re.compile(r"\[\[(.*?)\]\]"), # links tags 'link': re.compile(r"\[(.*?)\]"), # external links tags 'italic': re.compile(r"\'\'(.*?)\'\'"), # italic text tags 'bold' : re.compile(r"\'\'\'(.*?)\'\'\'"), # bold text tags } for pattern in wikipatterns: print "===> processing pattern :", pattern, "<==============" for paragraph in paragraphs: print wikipatterns[pattern].findall(paragraph) </code> But When i run it the result is not what i want, it's something like: <code> [EMAIL PROTECTED]:/local/python$python parser.py ===> processing pattern : bold <============== ['braille'] [] [] [] [] [] ===> processing pattern : section <============== [" Test '''wikitest'''"] [] [] ['=Wikitext='] [] [] ===> processing pattern : sectiontwo <============== [] [] [] ['Wikitext'] [] [] ===> processing pattern : link <============== ['[Image:image_link.jpg|rigth|thumbnail|200px|"PREMIER"'] ['http://www.google.com.cu'] ['[markup', '[programming_language|language'] [] [] ['</nowiki> this is a normal <nowiki>sign'] ===> processing pattern : italic <============== ["'wikitest"] ['Note: This is just an example to test some regular expressions stuffs.'] ['wikitext'] [] [] [] ===> processing pattern : wikilink <============== ['Image:image_link.jpg|rigth|thumbnail|200px|"PREMIER"'] [] ['markup', 'programming_language|language'] [] [] [] ===> processing pattern : a_nowiki <============== [] [] [] [] [] ['sign]'] </code> In the first case the result it's Ok In the second the first it's Ok, but the second it's not because second result it's a level two section not a level one. In the third result things are Ok The fourth, the first and thrid result are wrong beacuse they are level two links, but the second it's Ok. The fifth it Ok The sixth shows only one result and it should show two. Please help. PS: am really sorry about my technical English. -- Michel Perez )\._.,--....,'``. Ulrico Software Group /, _.. \ _\ ;`._ ,. Nihil est tam arduo et difficile human `._.-(,_..'--(,_..'`-.;.' mens vincat. Séneca. ============================= --------------------------------------- Red Telematica de Salud - Cuba CNICM - Infomed -- http://mail.python.org/mailman/listinfo/python-list