I have uploaded my solution to the Google group. It seems to work well for small directories but runs out of memory pretty quickly on a huge directory. I'd appreciate any help with making it more efficient or prettier. I'm sure I can drum up my own uses for this.
Also, I thought I could use -> in find-duplicate-files but it didn't seem to work out the way I thought it would. Style hints would be greatly appreciated as well. :) Thanks, http://clojure.googlegroups.com/web/file_dup_finder.clj On May 27, 2009, at 11:51 PM, Timothy Pratley wrote: > > Yes that is a very elegant solution. > For convenience you might want another function: > (defn fast-compare > "Given two filenames returns true if the files are identical" > [fn1 fn2] > (let [i1 (get-info fn1), i2 (get-info fn2)] > (and (= (first i1) (first i2)) > (= (second i1) (second i2)) > (= (third i1) (third i2))))) > > I wonder if there is a more idiomatic way to compare two lazy > sequences... lazily? > > > Regarding lazy-hash-map it just allows you to name the fields instead > of using an index, but I think without is better for something that > can be expressed like you have. No need for an external dependency. > (ooops I called it lazy-map before). > > > Regards, > Tim. > > On May 28, 2:34 pm, Mikio Hokari <mikiohok...@gmail.com> wrote: >> Hash calculation runs only when necessary, because >> Clojure's map function is lazy now. >> >> more sample code: >> >> (nth (get-info "a.txt") 0) >> (nth (get-info "b.txt") 0) >> (nth (get-info "b.txt") 1) >> >> result: >> size a.txt >> size b.txt >> quickhash b.txt >> >> Output result shows it. >> When >> (nth (get-info "a.txt") 0) >> is evaluated, only get-size function runs. >> Evaluation of get-quickhash and get-hash is delayed. >> >> Eval >> (nth (get-info "a.txt") 1) >> cause evaluation of get-quickhash, >> but not get-hash. >> >> 2009/5/28 Timothy Pratley <timothyprat...@gmail.com>: >> >> >> >>> Sounds like a job for lazy-map to me! >>> http://kotka.de/projects/clojure/lazy-map.html >> >>> On May 28, 11:52 am, Korny Sietsma <ko...@sietsma.com> wrote: >>>> Hi all, >> >>>> I have some ruby code that I'm thinking of porting to clojure, >>>> but I'm >>>> not sure how to translate this idiom to a functional world: >>>> I have objects that are externally immutable, but have internal >>>> mutable state they use for optimisation, specifically in this >>>> case to >>>> defer un-needed calculations. >> >>>> Basically, I have a FileInfo class that wraps a data file, used to >>>> compare lots of files on my system. >>>> It has an "exact_match" method similar to: >>>> def exact_match(other) >>>> return false if size != other.size >>>> return false if quickhash() != other.quickhash() >>>> return hash() != other.hash() >>>> end >> >>>> quickhash and hash store their results in instance variables so >>>> they >>>> only need to do the expensive calculations once - and quite often >>>> they >>>> never need to get calculated at all; I'm looking for duplicate >>>> files, >>>> but many files have no duplicate, so probably never need to have >>>> their >>>> contents hashed. >> >>>> How would I do this in a functional way? My first effort would be >>>> something like >>>> (defn hash [filename] (memoize (... hash function ...))) >>>> but I have a couple of problems with this: >>>> - it doesn't seem to store the hash value with the rest of the >>>> file >>>> information, which feels a bit ugly >>>> - I assume it means storing the full filename three times, once >>>> in >>>> the original file info structure, once in the memoized hash >>>> function, >>>> and once in the memoized quickhash function. My program >>>> struggles to >>>> get enough RAM to track as many files as I'd like already - storing >>>> the filename multiple times would blow out memory quite badly. >> >>>> I guess I could define a unique key for each filename, and define >>>> hash >>>> as a function on that key, but then hash would need to be able to >>>> access the list of filenames somehow. It's starting to get >>>> beyond me >>>> - I'm hoping there's a simpler option! >> >>>> Any suggestions? I'd hope this is not an uncommon idiom. >> >>>> - Korny >> >>>> -- >>>> Kornelis Sietsma korny at my surname dot com >>>> "Every jumbled pile of person has a thinking part >>>> that wonders what the part that isn't thinking >>>> isn't thinking of" > > — Daniel Lyons http://www.storytotell.org -- Tell It! --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~----------~----~----~----~------~----~------~--~---