kepes.krisztian wrote: > Hi ! > > I want to get infos from a html, but I need all chars except <. > All chars is: over chr(31), and over (128) - hungarian accents. > The .* is very hungry, it is eat < chars too. > > If I can use not, I simply define an regexp. > [not<]*</a> > > It is get all in the href. > > I wrote this programme, but it is too complex - I think: > > import re > > l=[] > for i in range(33,65): > if i<>ord('<') and i<>ord('>'): > l.append('\\'+chr(i)) > s='|'.join(l) > all='\w|\s|\%s-\%s|%s'%(chr(128),chr(255),s) > sre='<Subj>([%s]{1,1024})</d>'%all > #sre='<Subj>([?!\\<]{1,1024})</d>' > s='<Subj>xmvccv มมม sdfkdsfj eirfie</d><A></d>' > > > print sre > print s > cp=re.compile(sre) > m=cp.search(s) > print m.groups() > > Have the python an regexp exception, or not function ? How to I use it ? > > Thanx for help: > kk
You could try these regexps or variants thereof: "<Subj>([^<]*)" '^' changes the character set to exclude any characters listed after '^' from matching. "<Subj>(.*?)<" The '?' makes the preceding '*' non-greedy, i. e. the following '<' will match the first '<' character encountered in the string to be searched. Peter -- http://mail.python.org/mailman/listinfo/python-list