On Wednesday, January 29, 2020 at 8:27:03 PM UTC, Chris Angelico wrote: > On Thu, Jan 30, 2020 at 7:06 AM jkn <jkn...@nicorp.f9.co.uk> wrote: > > > > Hi all > > I'm almost embarrassed to ask this as it's "so simple", but thought I'd > > give > > it a go... > > Hey, nothing wrong with that! > > > I want to be a able to use a simple 'download manager' which I was going to > > write > > (in Python), but then wondered if there was something suitable already out > > there. > > I haven't found it, but thought people here might have some ideas for > > existing work, or approaches. > > > > The situation is this - I have a long list of file URLs and want to > > download these > > as a 'background task'. I want this to process to be 'crudely persistent' - > > you > > can CTRL-C out, and next time you run things it will pick up where it left > > off. > > A decent project. I've done this before but in restricted ways. > > > The download part is not difficult. Is is the persistence bit I am thinking > > about. > > It is not easy to tell the name of the downloaded file from the URL. > > > > I could have a file with all the URLs listed and work through each line in > > turn. > > But then I would have to rewrite the file (say, with the > > previously-successful > > lines commented out) as I go. > > > > Hmm. The easiest way would be to have something from the URL in the > file name. For instance, you could hash the URL and put the first few > digits of the hash in the file name, so > http://some.domain.example/some/path/filename.html might get saved > into "a39321604c - filename.html". That way, if you want to know if > it's been downloaded already, you just hash the URL and see if any > file begins with those digits. > > Would that kind of idea work? > > ChrisA
Hi Chris Thanks for the idea. I should perhaps have said more clearly that it is not easy (though perhaps not impossible) to infer the name of the downloaded data from the URL - it is not a 'simple' file URL, more of a tag. However I guess your scheme would work if I just hashed the URL and created a marker file - "a39321604c.downloaded" once downloaded. The downloaded content would be separately (and somewhat opaquely) named, but that doesn't matter. MRAB's scheme does have the disadvantages to me that Chris has pointed out. Jon N -- https://mail.python.org/mailman/listinfo/python-list