"Dr.Ruud" <rvtol+use...@isolution.nl> writes: > On 12/06/2013 11:33, lee wrote: >> Jim Gibson <jimsgib...@gmail.com> writes: >>> On Jun 11, 2013, at 9:44 PM, lee wrote: > >>>> I've been googling for examples of how to create a sha-2 sum of a >>>> file in perl without success. What I'm looking for is something >>>> like: >>>> >>>> $hash = create_sha2_sum( $filename); >>>> >>>> Do you know of any examples I could look at? Or is there a better >>>> way to figure out if a file has been modified? >>> >>> The first thing to do would be to check the file size. If the file >>> size has changed, then the file has been modified. So you will want to >>> save the file size. >> >> The file might be modified without changing its size ... >> >>> If the files sizes are the same, then you can compare some sort of >>> digest e.g. SHA. I haven't used any, so I cannot advise. >> >> ... so I'm better off by just using a hash which I'd need anyway. > > No. If the file is real big, then calculating the hash (of the new > file) can take a long time. Which would be superfluous if the file > size also has changed. > > I store: file size, fingerprint of first 256 bytes, fingerprint of > total file. So only if both the size and the light fingerprint are the > same, I need to check the full fingerprint.
Oh now I see your point: You are trying to avoid having to compute hashs for large files when this isn't needed, so you get much better efficiency by checking other information first. In my application, my estimate is that there will be a set of around 100--150 files. Once a file is closed and reported one last time, it doesn't need to be considered anymore, so the number of relevant files is limited. Each file is only about 2kB in size. Reports will be generated only monthly. Considering this, it seems doubtful that the additional effort in programming and procedure required to handle exceptions in which not /both/ mtime and size have changed compared to simply go by hash only is worthwhile the benefit in performance: The effective difference in this case is probably like the difference between "(almost) instantly" and "about 3 seconds". OTOH, it's nicer to make it so that file size doesn't have a major impact on performance because the solution would be more versatile. Unfortunately, creating a hash only over (random) parts of the files won't suffice because a different part of the file might have changed than the one sampled. I don't want the handling of exceptions to require manual intervention, either. This means that I can't get around saving hashs for whole files. I could only save computing hashs for those files that still have the same size /and/ same mtime they had a month ago. Having that said, I do like this idea. Hashs would need to be computed during report generation only when size and mtime indicate that a file might have changed, instead of computing them all every time just to see if a file did change. I think I'll probably go for that. -- "Object-oriented programming languages aren't completely convinced that you should be allowed to do anything with functions." http://www.joelonsoftware.com/items/2006/08/01.html -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/