Splitting on a word

qwweeeit Wed, 13 Jul 2005 06:25:31 -0700

Hi all,
I am writing a script to visualize (and print)
the web references hidden in the html files as:
'<a href="web reference"> underlined reference</a>'
Optimizing my code, I found that an essential step is:
splitting on a word (in this case 'href').


I am asking if there is some alternative (more pythonic...):

# SplitMultichar.py

import re

# string s simulating an html file
s='ffy: ytrty <a href="www.python.org">python</a> fyt <A
HREF="wwwx">wx</A>  dtrtf'
p=re.compile(r'\bhref\b',re.I)

lHref=p.findall(s)  # lHref=['href','HREF']
# for normal html files the lHref list has more elements
#     (more web references)

c='~' # char to be used as delimiter
# c=chr(127) # char to be used as delimiter
for i in lHref:
    s=s.replace(i,c)

# s ='ffy: ytrty <a ~="www.python.org">python</a> fyt <A
~="wwwx">wx</A>  dtrtf'

list=s.split(c)
# list=['ffy: ytrty <a ', '="www.python.org">python</a> fyt <A ',
'="wwwx">wx</A>  dtrtf']
#=-----------------------------------------------------

If you save the original s string to xxx.html, any browser
can visualize it.
To be sure as delimiter I choose chr(127)
which surely is not present in the html file.
Bye.

-- 
http://mail.python.org/mailman/listinfo/python-list

Splitting on a word

Reply via email to