jkn, > MRAB's scheme does have the disadvantages to me that Chris has pointed > out.
Nothing that can't be countered by keeping copies of the last X number of to-be-dowloaded-URLs files. As for rewriting every time, you will /have/ to write something for every action (and flush the file!), if you think you should be able to ctrl-c (or worse) out of the program. But, you could opt to write this sessions successfully downloaded URLs to a seperate file, and only merge that with the origional one program start. That together with an integrity check of the seperate file (eventually on a line-by-line (URL) basis) should make the origional files corruption rather unlikely. A database /sounds/ good, but what happens when you ctrl-c outof a non-atomic operation ? How do you fix that ? IOW: Databases can be corrupted for pretty-much the same reason as for a simple datafile (but with much worse consequences). Also think of the old adagio: "I had a problem, and than I thought I could use X. Now I have two problems..." - with X traditionally being "regular expressions". In other words: do KISS (keep it ....) By the way: The "just write the URLs in a folder" method is not at all a bad one. /Very/ easy to maintain, resilent (especially when you consider the self-repairing capabilities of some filesystems) and the polar opposite of a "customer lock-in". :-) Regards, Rudy Wieser -- https://mail.python.org/mailman/listinfo/python-list