Rob Hudson wrote:
wget -E --load-cookies /path/to/firefox/profiles/cookies.txt -r -k -l
-r = recurse ...'

I missed this with pycurl & have yet to find example that supports it
:(  Then I scanned the curl FAQ and found 3.15 [0]

3.15 Can I do recursive fetches with curl?
http://curl.mirrors.cyberservers.net/docs/faq.html#3.15

This means to use pycurl you need a list of urls. This is difficult as
you need to parse the returned page. I would have thought curl could
support this. Obviously not. One obvious way is to use pycurl by
pointing it to a base url, then for each returned page (assuming html)
parse with HTMLParser [1] build a list for each url and extract the
pages that way.

Not what you asked. Want a quick hack solution do what 'Rob Hudson'
suggests & use wget (which you probably did anyway) OR otherwise try an
old favourite of mine and use websucker. [2] The reason I looked at
pycurl was the obvious django, pycurl integration (ie: a django app
that mirrored sites).


Reference
----------------
[0] Curl FAQ, 'Can I do recursive fetches with curl?, NO'
http://curl.mirrors.cyberservers.net/docs/faq.html#3.15
[Accessed Saturday, 6 January, 2007]

[1] Python HTMLParser module, ''Parses text files in format of HTML &
XHTML'
http://docs.python.org/lib/module-HTMLParser.html
[Accessed Saturday, 6 January, 2007]

[2] python websucker, 'creates "mirror copy of a remote site"'
http://svn.python.org/view/python/trunk/Tools/webchecker/
[Accessed Saturday, 6 January, 2007]


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Django 
users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to