On 22/08/2016 15:47, Richard Gaskin wrote:
Alex Tweedly wrote:

> Would caseSensitive make it faster ?

In theory yes, since it avoids having to run the internal equivalent of toLower on each thing being compared.

But since these are bytes, not chars, that doesn't apply.

However in some recent experiments involving pattern matching on text I was unable to measure a difference. That shouldn't be taken as definitive; there are a lot of distracting things going on in the routine I was testing with. I haven't yet done a good isolated test of caseSensitive.


> Re md5 for repeated use - yes, it probably is worth doing.

The rsync algo offers an md5 option, but by default it compares files based only on mod date and size. The thinking is that if both of those match, the odds of having a changed file are very low.

Perhaps an optimal algo in your system would reserve md5 for those cases where size and mod date match, which will eliminate most cases with less CPU time.

Thanks Richard, but this is a very different context. In my case, the mod dates will never match; the duplicate files arise because the user has imported the same photos from a camera more than once (into different folders, or into the the same one using auto-renaming), or has copied a folder of files to trim out the ones to be copied to another machine, or .... any of a number of things, but all causing the copied file to have a different mod date from the original.

My original benchmarking was faulty; in fact, taking the md5hash for the two files is only 50% more expensive than simply comparing them (higher if they are actually different), but that leaves the conclusion unchanged - it's not worth the extra complexity. There is an assumption underlying this - that in real life (different from my development phase), the majority of genuine duplicates will be dealt with (i.e. one copy deleted or moved elsewhere) fairly quickly, so the same comparisons won't be run repeatedly. The remaining cases of same file size are so rare (around 80 in my full 50,000 file set) that pair-wise comparisons take only 4 seconds (or 2 seconds if I use an older version of LC), so no great impact on the user experience.

(The other parts of the overall workflow - where I would like to gather and use the exif data - are more strongly impacted by the performance issue - but my desire to use the latest of LC8 rather than an obsolete version is probably strong enough to override that, and I'll just be more patient - even though patient is not my natural state :-)



_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to