On 08.10.2012, at 07:38, ptone <[email protected]> wrote:

> so after scanning this thread and the ticket again - it is still unclear that 
> there could be a completely universal solution.
> 
> While it would be nice if the storage API had a checksum(name) or md5(name) 
> method - not all custom storage backends are going to support a single 
> checksum standard.  S3 doesn't explicitly support MD5 (apparently it 
> unofficially does through ETags).  Without a universal checksum - you can't 
> use it to compare files across arbitrary backends.

You're able to ask S3 for the date of last modification, I don't see why a 
comparison by hashing the file content is needed additionally. It'd have to 
download the full file to do that on Django's side and I'm not aware of a API 
for getting a hash from cloudfiles, S3 etc.

> I do agree that hacking modified_time return value is a little ugly - the API 
> is clearly documented as "returns a datetime..." - so returning a M55 
> checksum there is, well, hacky.

I beg to differ, returning a datetime object makes absolute sense for comparing 
it to another datetime object. What I meant before is that the modified_time 
method can be written however the user wants as long as it returns a datetime 
object, even a date that is known to be older than the file on disk.

> If you are passionate about moving this forward, here is what I'd suggest.
> 
> Implement, document, and test .md5(name) as a standard method on storage 
> backends - like modified_time this would raise NotImplementedError if not 
> available - this could easily be its own ticket. md5 is probably the closest 
> you'll get to a checksum standard.

-1

Jannis 


> On Sunday, October 7, 2012 8:59:16 PM UTC-7, Dan Loewenherz wrote:
> This issue just got me again tonight, so I'll try to push once more on this 
> issue. It seems right now most people don't care that this is broken, which 
> is a bummer, but in which case I'll just continue using my working solution.
> 
> Dan
> 
> On Sat, Oct 6, 2012 at 10:48 AM, Dan Loewenherz <[email protected]> wrote:
> Hey Jannis,
> 
> On Mon, Oct 1, 2012 at 12:47 AM, Jannis Leidel <[email protected]> wrote:
> 
> On 30.09.2012, at 23:41, Dan Loewenherz <[email protected]> wrote:
> 
> > Many backends don't support last modified times, and even if they all did, 
> > it's incorrect to assume that last modified time is an accurate heuristic 
> > for whether a file has already been uploaded or not.
> 
> Well but it's an accurate way to decide whether a file has been changed on 
> the filesystem, and that's what collectstatic cares about. The storage 
> backend *is* the API to extend that when needed, so feel free to use it.
> 
> It's accurate *only* in certain situations. And on a distributed development 
> team, I've run into a lot of issues with developers re-upload files that have 
> already been uploaded because they just recently updated their repo.
> 
> A checksum is the only true accurate method to determine if a file has 
> changed.
> 
> Additionally, you didn't address my point that I quoted from. Storage 
> backends don't just reflect filesystems--they could reflect files stored in a 
> database, S3, etc. And some of these filesystems don't support last modified 
> times.
> 
> > It might be a better idea to let the backends decide when a file has been 
> > changed (instead of just calling the backend's last modified method).
> 
> I don't understand, you can easily implement exactly that in the 
> last_modified method if you'd like.
> 
> This is a bit confusing...why call it last_modified when that's doesn't 
> necessarily reflect what it's doing? It would be more flexible to create two 
> methods:
> 
> def modification_identifier(self):
> 
> def has_changed(self):
> 
> Then, any backend could implement these however they might like, and 
> collectstatic would have no excuse in uploading the same file more than once. 
> Overloading last_modified to also do things like calculate md5's seems a bit 
> hacky to me, and confusing for any developer maintaining a custom storage 
> backend that doesn't support last modified.
> 
> Dan
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Django developers" group.
> To view this discussion on the web visit 
> https://groups.google.com/d/msg/django-developers/-/weKD2x1XY4oJ.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/django-developers?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

Reply via email to