On 22/08/2016 15:47, Richard Gaskin wrote:
Alex Tweedly wrote:
> Would caseSensitive make it faster ?
In theory yes, since it avoids having to run the internal equivalent
of toLower on each thing being compared.
But since these are bytes, not chars, that doesn't apply.
However in some recent experiments involving pattern matching on text
I was unable to measure a difference. That shouldn't be taken as
definitive; there are a lot of distracting things going on in the
routine I was testing with. I haven't yet done a good isolated test
of caseSensitive.
> Re md5 for repeated use - yes, it probably is worth doing.
The rsync algo offers an md5 option, but by default it compares files
based only on mod date and size. The thinking is that if both of
those match, the odds of having a changed file are very low.
Perhaps an optimal algo in your system would reserve md5 for those
cases where size and mod date match, which will eliminate most cases
with less CPU time.
Thanks Richard, but this is a very different context. In my case, the
mod dates will never match; the duplicate files arise because the user
has imported the same photos from a camera more than once (into
different folders, or into the the same one using auto-renaming), or has
copied a folder of files to trim out the ones to be copied to another
machine, or .... any of a number of things, but all causing the copied
file to have a different mod date from the original.
My original benchmarking was faulty; in fact, taking the md5hash for the
two files is only 50% more expensive than simply comparing them (higher
if they are actually different), but that leaves the conclusion
unchanged - it's not worth the extra complexity. There is an assumption
underlying this - that in real life (different from my development
phase), the majority of genuine duplicates will be dealt with (i.e. one
copy deleted or moved elsewhere) fairly quickly, so the same comparisons
won't be run repeatedly. The remaining cases of same file size are so
rare (around 80 in my full 50,000 file set) that pair-wise comparisons
take only 4 seconds (or 2 seconds if I use an older version of LC), so
no great impact on the user experience.
(The other parts of the overall workflow - where I would like to gather
and use the exif data - are more strongly impacted by the performance
issue - but my desire to use the latest of LC8 rather than an obsolete
version is probably strong enough to override that, and I'll just be
more patient - even though patient is not my natural state :-)
_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode