On 08.10.2012, at 07:38, ptone <[email protected]> wrote: > so after scanning this thread and the ticket again - it is still unclear that > there could be a completely universal solution. > > While it would be nice if the storage API had a checksum(name) or md5(name) > method - not all custom storage backends are going to support a single > checksum standard. S3 doesn't explicitly support MD5 (apparently it > unofficially does through ETags). Without a universal checksum - you can't > use it to compare files across arbitrary backends.
You're able to ask S3 for the date of last modification, I don't see why a comparison by hashing the file content is needed additionally. It'd have to download the full file to do that on Django's side and I'm not aware of a API for getting a hash from cloudfiles, S3 etc. > I do agree that hacking modified_time return value is a little ugly - the API > is clearly documented as "returns a datetime..." - so returning a M55 > checksum there is, well, hacky. I beg to differ, returning a datetime object makes absolute sense for comparing it to another datetime object. What I meant before is that the modified_time method can be written however the user wants as long as it returns a datetime object, even a date that is known to be older than the file on disk. > If you are passionate about moving this forward, here is what I'd suggest. > > Implement, document, and test .md5(name) as a standard method on storage > backends - like modified_time this would raise NotImplementedError if not > available - this could easily be its own ticket. md5 is probably the closest > you'll get to a checksum standard. -1 Jannis > On Sunday, October 7, 2012 8:59:16 PM UTC-7, Dan Loewenherz wrote: > This issue just got me again tonight, so I'll try to push once more on this > issue. It seems right now most people don't care that this is broken, which > is a bummer, but in which case I'll just continue using my working solution. > > Dan > > On Sat, Oct 6, 2012 at 10:48 AM, Dan Loewenherz <[email protected]> wrote: > Hey Jannis, > > On Mon, Oct 1, 2012 at 12:47 AM, Jannis Leidel <[email protected]> wrote: > > On 30.09.2012, at 23:41, Dan Loewenherz <[email protected]> wrote: > > > Many backends don't support last modified times, and even if they all did, > > it's incorrect to assume that last modified time is an accurate heuristic > > for whether a file has already been uploaded or not. > > Well but it's an accurate way to decide whether a file has been changed on > the filesystem, and that's what collectstatic cares about. The storage > backend *is* the API to extend that when needed, so feel free to use it. > > It's accurate *only* in certain situations. And on a distributed development > team, I've run into a lot of issues with developers re-upload files that have > already been uploaded because they just recently updated their repo. > > A checksum is the only true accurate method to determine if a file has > changed. > > Additionally, you didn't address my point that I quoted from. Storage > backends don't just reflect filesystems--they could reflect files stored in a > database, S3, etc. And some of these filesystems don't support last modified > times. > > > It might be a better idea to let the backends decide when a file has been > > changed (instead of just calling the backend's last modified method). > > I don't understand, you can easily implement exactly that in the > last_modified method if you'd like. > > This is a bit confusing...why call it last_modified when that's doesn't > necessarily reflect what it's doing? It would be more flexible to create two > methods: > > def modification_identifier(self): > > def has_changed(self): > > Then, any backend could implement these however they might like, and > collectstatic would have no excuse in uploading the same file more than once. > Overloading last_modified to also do things like calculate md5's seems a bit > hacky to me, and confusing for any developer maintaining a custom storage > backend that doesn't support last modified. > > Dan > > > -- > You received this message because you are subscribed to the Google Groups > "Django developers" group. > To view this discussion on the web visit > https://groups.google.com/d/msg/django-developers/-/weKD2x1XY4oJ. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/django-developers?hl=en. -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
