Urllib's urlopen and urlretrieve

2013-02-21 Thread qoresucks
I only just started Python and given that I know nothing about network 
programming or internet programming of any kind really, I thought it would be 
interesting to try write something that could create an archive of a website 
for myself. With this I started trying to use the urllib library, however I am 
having a problem understanding why certain things wont work with the 
urllib.urlretrieve and urllib.urlopen then reading.

Why is it that when using urllib.urlopen then reading or urllib.urlretrieve, 
does it only give me parts of the sites, loosing the formatting, images, 
etc...? How can I get around this?

Lastly, while its a bit off topic, I lack a good understanding of network 
programming as a whole. From making programs communicate or to simply extract 
data from URL's, I don't know where to even begin, which has lead me to 
learning python to better understand it hopefully then carry it over to other 
languages I know. Can anyone give me some advice on where to begin learning 
this information? Even if its in another language.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Urllib's urlopen and urlretrieve

2013-02-21 Thread qoresucks
Initially I was just trying the html, but later when I attempted more 
complicated sites that weren't my own I noticed that large bulks of the site 
were lost in the process. The urllib code essentially looks like what I was 
trying but it didn't work as I had expected.

To be more specific, after I got it working for my own little page, I attempted 
to take it further and get all the lessons from Learn Python The Hard Way. When 
I tried the same method on the first intro page to see if I was even getting it 
right, the html code was all there but upon opening it I noticed the format was 
all wrong, colors were off for the background, images, etc... were all missing. 
So clearly I ended up misunderstanding something and its something critical I 
need to understand. 

As for the OS, I primarily use Mac OS, however well versed in linux and windows 
if there is anything specific out there that might help. 

As for which version if Python, I have been using Python 2 to learn on as I 
heard that Python 3 was still largely unadopted due to a lack of library 
support etc... by comparison. Are people adopting it fast enough now that I 
should consider learning on 3 instead of 2?

Also, it isn't so much to do it for technical reasons but rather I thought it 
would be something interesting and fun to learn some form of internet/network 
programming. Granted, its not the best approach, but I'm not really aware of 
too many others, and I it does seem interesting to me. 

Python programming probably isn't the best way to initially approach this I 
agree, but I wasn't sure what to research on or to get a better grasp of 
network/internet/web programming so I figured I would just dive head first and 
figure things out, and reinforce more programming while learning 
internet/network programming was my initial goal. 

Thank you all for your responses though. :)


On Thursday, February 21, 2013 7:59:26 AM UTC-5, Michael Herman wrote:
> Are you just trying to get the html? If so, you can use this code-
> 
> 
> 
> import urllib
> 
> 
> # fetch the and download a webpage, nameing it test.html
> urllib.urlretrieve("http://www.web2py.com/";, filename="test.html")
> 
> 
> 
> 
> 
> 
> I recommend using the requests library, as it's easier to use and more 
> powerful:
> 
> 
> 
> 
> 
> 
> 
> 
> 
> import requests
> 
> # retrive the webpage
> r = requests.get("http://www.web2py.com/";)
> 
> # write the content to test_request.html
> with open("test_requests.html", "wb") as code:   
> 
> 
> 
> 
> 
> code.write(r.content)
> 
> If you want to get up to speed quickly on internet programming, I have a 
> course I am developing. It's on kickstarter - http://kck.st/VQj8hq. The first 
> section of the book dives into web fundamentals and internet programming. 
> 
> 
> 
> 
> 
> 
> 
> On Thu, Feb 21, 2013 at 4:12 AM,   wrote:
> 
> 
> I only just started Python and given that I know nothing about network 
> programming or internet programming of any kind really, I thought it would be 
> interesting to try write something that could create an archive of a website 
> for myself. With this I started trying to use the urllib library, however I 
> am having a problem understanding why certain things wont work with the 
> urllib.urlretrieve and urllib.urlopen then reading.
> 
> 
> 
> 
> 
> Why is it that when using urllib.urlopen then reading or urllib.urlretrieve, 
> does it only give me parts of the sites, loosing the formatting, images, 
> etc...? How can I get around this?
> 
> 
> 
> Lastly, while its a bit off topic, I lack a good understanding of network 
> programming as a whole. From making programs communicate or to simply extract 
> data from URL's, I don't know where to even begin, which has lead me to 
> learning python to better understand it hopefully then carry it over to other 
> languages I know. Can anyone give me some advice on where to begin learning 
> this information? Even if its in another language.
> 
> 
> 
> --
> 
> http://mail.python.org/mailman/listinfo/python-list
-- 
http://mail.python.org/mailman/listinfo/python-list