Re: How to get the source code of an url?

donarb Tue, 27 Nov 2012 10:17:22 -0800

You're not parsing XML, it's HTML and it's not well formed, for example 
your title and author tags have closing tags that don't match. Your HTML 
needs to be valid XHTML before trying to use an XML parser on it. You might 
want to try something else to parse this, like Scrapy or Beautiful Soup.


On Tuesday, November 27, 2012 3:32:16 AM UTC-8, wbc wrote:
>
> I'm trying to parse an xml url with minidom. I have an url with my xml 
> data.
>
> This is my code:
>
> url = "http://myurl.com/wsname.asp";    
> datasource = urllib2.urlopen(url)
>
> dom = parse(datasource)
> handleElements(dom)
>
> my handleElements function to parse xml:
>
> def handleElements(dom):
>     Elements = dom.getElementsByTagName("book")
>     for item in Elements:
>         getText(item.getElementsByTagName("id")[0].childNodes)
>         ....
>
> My xml:
>
> <html><head><style type="text/css"></style></head><body><bibliothque>
>  <book>
>  <id>747</id>
>  <title>L'alchimiste</nomclient>
>  <author>Paulo Cohelo </nomposte>
>  </book> 
>  ...
>  </bibliothque>  
> </body>
>
> I get no error, but no result!
>
> my handleElements() works fine because when I copy the same data from my 
> url put it in a string and use parseString instead of parse everything 
> works fine and I get my results.
>
> But when trying to openurl, Elements is empty and the loop is not even 
> started
>
> *
> *
>
> Seems that I need to get the sourcecode of the url (not it's content) 
> (like the view-source in chrome) How can I do that?
>
> Thanks
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/django-users/-/aYygL7amauAJ.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.

Re: How to get the source code of an url?

Reply via email to