Hahaha - you beat me to it! I expect the memory usage would be dominated by the slurping (if there are large files), perhaps using a DigestInputStream avoids this? (I'm not familiar with it, but sounds... streamy). PS: you defined file- comparator but don't use it, your source could be even shorter :P
On May 28, 7:05 pm, Daniel Lyons <fus...@storytotell.org> wrote: > I have uploaded my solution to the Google group. It seems to work well > for small directories but runs out of memory pretty quickly on a huge > directory. I'd appreciate any help with making it more efficient or > prettier. I'm sure I can drum up my own uses for this. > > Also, I thought I could use -> in find-duplicate-files but it didn't > seem to work out the way I thought it would. > > Style hints would be greatly appreciated as well. :) > > Thanks, > > http://clojure.googlegroups.com/web/file_dup_finder.clj > > On May 27, 2009, at 11:51 PM, Timothy Pratley wrote: > > > > > > > Yes that is a very elegant solution. > > For convenience you might want another function: > > (defn fast-compare > > "Given two filenames returns true if the files are identical" > > [fn1 fn2] > > (let [i1 (get-info fn1), i2 (get-info fn2)] > > (and (= (first i1) (first i2)) > > (= (second i1) (second i2)) > > (= (third i1) (third i2))))) > > > I wonder if there is a more idiomatic way to compare two lazy > > sequences... lazily? > > > Regarding lazy-hash-map it just allows you to name the fields instead > > of using an index, but I think without is better for something that > > can be expressed like you have. No need for an external dependency. > > (ooops I called it lazy-map before). > > > Regards, > > Tim. > > > On May 28, 2:34 pm, Mikio Hokari <mikiohok...@gmail.com> wrote: > >> Hash calculation runs only when necessary, because > >> Clojure's map function is lazy now. > > >> more sample code: > > >> (nth (get-info "a.txt") 0) > >> (nth (get-info "b.txt") 0) > >> (nth (get-info "b.txt") 1) > > >> result: > >> size a.txt > >> size b.txt > >> quickhash b.txt > > >> Output result shows it. > >> When > >> (nth (get-info "a.txt") 0) > >> is evaluated, only get-size function runs. > >> Evaluation of get-quickhash and get-hash is delayed. > > >> Eval > >> (nth (get-info "a.txt") 1) > >> cause evaluation of get-quickhash, > >> but not get-hash. > > >> 2009/5/28 Timothy Pratley <timothyprat...@gmail.com>: > > >>> Sounds like a job for lazy-map to me! > >>>http://kotka.de/projects/clojure/lazy-map.html > > >>> On May 28, 11:52 am, Korny Sietsma <ko...@sietsma.com> wrote: > >>>> Hi all, > > >>>> I have some ruby code that I'm thinking of porting to clojure, > >>>> but I'm > >>>> not sure how to translate this idiom to a functional world: > >>>> I have objects that are externally immutable, but have internal > >>>> mutable state they use for optimisation, specifically in this > >>>> case to > >>>> defer un-needed calculations. > > >>>> Basically, I have a FileInfo class that wraps a data file, used to > >>>> compare lots of files on my system. > >>>> It has an "exact_match" method similar to: > >>>> def exact_match(other) > >>>> return false if size != other.size > >>>> return false if quickhash() != other.quickhash() > >>>> return hash() != other.hash() > >>>> end > > >>>> quickhash and hash store their results in instance variables so > >>>> they > >>>> only need to do the expensive calculations once - and quite often > >>>> they > >>>> never need to get calculated at all; I'm looking for duplicate > >>>> files, > >>>> but many files have no duplicate, so probably never need to have > >>>> their > >>>> contents hashed. > > >>>> How would I do this in a functional way? My first effort would be > >>>> something like > >>>> (defn hash [filename] (memoize (... hash function ...))) > >>>> but I have a couple of problems with this: > >>>> - it doesn't seem to store the hash value with the rest of the > >>>> file > >>>> information, which feels a bit ugly > >>>> - I assume it means storing the full filename three times, once > >>>> in > >>>> the original file info structure, once in the memoized hash > >>>> function, > >>>> and once in the memoized quickhash function. My program > >>>> struggles to > >>>> get enough RAM to track as many files as I'd like already - storing > >>>> the filename multiple times would blow out memory quite badly. > > >>>> I guess I could define a unique key for each filename, and define > >>>> hash > >>>> as a function on that key, but then hash would need to be able to > >>>> access the list of filenames somehow. It's starting to get > >>>> beyond me > >>>> - I'm hoping there's a simpler option! > > >>>> Any suggestions? I'd hope this is not an uncommon idiom. > > >>>> - Korny > > >>>> -- > >>>> Kornelis Sietsma korny at my surname dot com > >>>> "Every jumbled pile of person has a thinking part > >>>> that wonders what the part that isn't thinking > >>>> isn't thinking of" > > — > Daniel Lyonshttp://www.storytotell.org-- Tell It! --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~----------~----~----~----~------~----~------~--~---