Hi !
I want to get infos from a html, but I need all chars except <. All chars is: over chr(31), and over (128) - hungarian accents. The .* is very hungry, it is eat < chars too.
If I can use not, I simply define an regexp. [not<]*</a>
It is get all in the href.
I wrote this programme, but it is too complex - I think:
import re
l=[] for i in range(33,65): if i<>ord('<') and i<>ord('>'): l.append('\\'+chr(i)) s='|'.join(l) all='\w|\s|\%s-\%s|%s'%(chr(128),chr(255),s) sre='<Subj>([%s]{1,1024})</d>'%all #sre='<Subj>([?!\\<]{1,1024})</d>' s='<Subj>xmvccv ÁÁÁ sdfkdsfj eirfie</d><A></d>'
print sre print s cp=re.compile(sre) m=cp.search(s) print m.groups()
Have the python an regexp exception, or not function ? How to I use it ?
Thanx for help: kk -- http://mail.python.org/mailman/listinfo/python-list