cut strings and parse for images

2004-12-06 Thread Andreas Volz
Hi,

I used SGMLParser to parse all href's in a html file. Now I need to cut
some strings. For example:

http://www.example.com/dir/example.html

Now I like to cut the string, so that only domain and directory is
left over. Expected result: 

http://www.example.com/dir/

I know how to do this in bash programming, but not in python. How could
this be done?

The next problem is not only to extract href's, but also images. A href
is easy:

Install

But a image is a little harder:



This is my current example code:

from sgmllib import SGMLParser

leach_url = "http://stargus.sourceforge.net/";

class URLLister(SGMLParser):
def reset(self):
SGMLParser.reset(self)
self.urls = []

def start_a(self, attrs):
href = [v for k, v in attrs if k=='href']
if href:
self.urls.extend(href)

if __name__ == "__main__":
import urllib
usock = urllib.urlopen(leach_url)
parser = URLLister()
parser.feed(usock.read())
parser.close()
usock.close()
for url in parser.urls: 
print url


Perhaps you've some tips how to solve this problems?

regards
Andreas
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: cut strings and parse for images

2004-12-06 Thread Andreas Volz
Am Mon, 06 Dec 2004 20:36:36 GMT schrieb Paul McGuire:

> Check out the urlparse module (in std distribution).  For images, you
> can provide a default addressing scheme, so you can expand
> "images/marine.jpg" relative to the current location.

Ok, this looks good. But I'm a really newbie to python and not able to
create a minimum example. Could you please give me a very small example
how to use urlparse? Or point me to an example in the web?

regards
Andreas
-- 
http://mail.python.org/mailman/listinfo/python-list


regex syntax

2004-12-06 Thread Andreas Volz
Hi,

ich kann nicht gut regex, aber für das nötigste reicht es eigentlich.
Irgendwie komm ich aber mit der Syntax der re.* Befehle in Python nicht
klar Vielleicht kann mir das an diesem Beispiel jemand zeigen:

string = "bild.jpg"

ich möchte jetzt einfach wissen ob in dem string ein ".jpg" vorkommt
oder nicht und dann eine Entscheidung treffen. Ich hab mir schon
überlegt einfach die letzten viel Stellen des strings "per Hand" auf die
Zeichenfolge zu vergleichen und so regex zu umgehen. Aber ich muss es
irgendwann ja doch mal nutzen ;-)

Gruß
Andreas
--
http://mail.python.org/mailman/listinfo/python-list


Re: cut strings and parse for images

2004-12-07 Thread Andreas Volz
Am Tue, 07 Dec 2004 00:40:02 GMT schrieb Paul McGuire:

> Is this in the ballpark of where you are trying to go?

Yes, thanks. You helped me a lot.

Andreas
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: regex syntax

2004-12-07 Thread Andreas Volz
Am Mon, 6 Dec 2004 17:24:35 -0800 (PST) schrieb [EMAIL PROTECTED]:

> Ich kann nicht spricht Deutch, aber:

Ahh! Sorry for this! It was a mistake :-(

regards
Andreas
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: regex syntax

2004-12-07 Thread Andreas Volz
Am 6 Dec 2004 17:43:21 -0800 schrieb [EMAIL PROTECTED]:

> viel besser als das vergleichbare Regexp:
> 
> >>> re.match('.*\.jpg$', filename)

Ok,now  I've choosen this regex:

> '.*\.(?i)jpe?g'

to get .jpg .JPG .jpeg .JPEG

seems to work. Is this correct?

regards
Andreas
-- 
http://mail.python.org/mailman/listinfo/python-list


regex for url paramter

2004-12-07 Thread Andreas Volz
Hi,

I try to extract a http target from a URL that is given as parameter.
urlparse couldn't really help me. I tried it like this

url="http://www.example.com/example.html?url=http://www.example.org/exa
mple.html"

p = re.compile( '.*url=')
url = p.sub( '', url)
print url
> http://www.example.org/example.html

This works, but if there're more parameters it doesn't work:

url2="http://www.example.com/example.html?url=http://www.example.org/exa
mple.html¶m=1"

p = re.compile( '.*url=')
url2 = p.sub( '', url2)
print url2
> http://www.example.org/example.html¶m=1

I played with regex to find one that matches also second case with
multible parameters. I think it's easy, but I don't know how to do. Can
you help me?

regards
Andreas
-- 
http://mail.python.org/mailman/listinfo/python-list