Hahaha - you beat me to it!

I expect the memory usage would be dominated by the slurping (if there
are large files), perhaps using a DigestInputStream avoids this? (I'm
not familiar with it, but sounds... streamy). PS: you defined file-
comparator but don't use it, your source could be even shorter :P


On May 28, 7:05 pm, Daniel Lyons <fus...@storytotell.org> wrote:
> I have uploaded my solution to the Google group. It seems to work well  
> for small directories but runs out of memory pretty quickly on a huge  
> directory. I'd appreciate any help with making it more efficient or  
> prettier. I'm sure I can drum up my own uses for this.
>
> Also, I thought I could use -> in find-duplicate-files but it didn't  
> seem to work out the way I thought it would.
>
> Style hints would be greatly appreciated as well. :)
>
> Thanks,
>
> http://clojure.googlegroups.com/web/file_dup_finder.clj
>
> On May 27, 2009, at 11:51 PM, Timothy Pratley wrote:
>
>
>
>
>
> > Yes that is a very elegant solution.
> > For convenience you might want another function:
> > (defn fast-compare
> >  "Given two filenames returns true if the files are identical"
> >  [fn1 fn2]
> >  (let [i1 (get-info fn1), i2 (get-info fn2)]
> >    (and (= (first i1) (first i2))
> >         (= (second i1) (second i2))
> >         (= (third i1) (third i2)))))
>
> > I wonder if there is a more idiomatic way to compare two lazy
> > sequences... lazily?
>
> > Regarding lazy-hash-map it just allows you to name the fields instead
> > of using an index, but I think without is better for something that
> > can be expressed like you have. No need for an external dependency.
> > (ooops I called it lazy-map before).
>
> > Regards,
> > Tim.
>
> > On May 28, 2:34 pm, Mikio Hokari <mikiohok...@gmail.com> wrote:
> >> Hash calculation runs only when necessary, because
> >> Clojure's map function is lazy now.
>
> >> more sample code:
>
> >> (nth (get-info "a.txt") 0)
> >> (nth (get-info "b.txt") 0)
> >> (nth (get-info "b.txt") 1)
>
> >> result:
> >> size a.txt
> >> size b.txt
> >> quickhash b.txt
>
> >> Output result shows it.
> >> When
> >>  (nth (get-info "a.txt") 0)
> >> is evaluated, only get-size function runs.
> >> Evaluation of get-quickhash and get-hash is delayed.
>
> >> Eval
> >>  (nth (get-info "a.txt") 1)
> >> cause evaluation of get-quickhash,
> >> but not get-hash.
>
> >> 2009/5/28 Timothy Pratley <timothyprat...@gmail.com>:
>
> >>> Sounds like a job for lazy-map to me!
> >>>http://kotka.de/projects/clojure/lazy-map.html
>
> >>> On May 28, 11:52 am, Korny Sietsma <ko...@sietsma.com> wrote:
> >>>> Hi all,
>
> >>>> I have some ruby code that I'm thinking of porting to clojure,  
> >>>> but I'm
> >>>> not sure how to translate this idiom to a functional world:
> >>>> I have objects that are externally immutable, but have internal
> >>>> mutable state they use for optimisation, specifically in this  
> >>>> case to
> >>>> defer un-needed calculations.
>
> >>>> Basically, I have a FileInfo class that wraps a data file, used to
> >>>> compare lots of files on my system.
> >>>> It has an "exact_match" method similar to:
> >>>>   def exact_match(other)
> >>>>      return false if size != other.size
> >>>>      return false if quickhash() != other.quickhash()
> >>>>      return hash() != other.hash()
> >>>>   end
>
> >>>> quickhash and hash store their results in instance variables so  
> >>>> they
> >>>> only need to do the expensive calculations once - and quite often  
> >>>> they
> >>>> never need to get calculated at all;  I'm looking for duplicate  
> >>>> files,
> >>>> but many files have no duplicate, so probably never need to have  
> >>>> their
> >>>> contents hashed.
>
> >>>> How would I do this in a functional way?  My first effort would be
> >>>> something like
> >>>>     (defn hash [filename] (memoize (... hash function ...)))
> >>>> but I have a couple of problems with this:
> >>>>   - it doesn't seem to store the hash value with the rest of the  
> >>>> file
> >>>> information, which feels a bit ugly
> >>>>   - I assume it means storing the full filename three times, once  
> >>>> in
> >>>> the original file info structure, once in the memoized hash  
> >>>> function,
> >>>> and once in the memoized quickhash function.  My program  
> >>>> struggles to
> >>>> get enough RAM to track as many files as I'd like already - storing
> >>>> the filename multiple times would blow out memory quite badly.
>
> >>>> I guess I could define a unique key for each filename, and define  
> >>>> hash
> >>>> as a function on that key, but then hash would need to be able to
> >>>> access the list of filenames somehow.  It's starting to get  
> >>>> beyond me
> >>>> - I'm hoping there's a simpler option!
>
> >>>> Any suggestions?  I'd hope this is not an uncommon idiom.
>
> >>>> - Korny
>
> >>>> --
> >>>> Kornelis Sietsma  korny at my surname dot com
> >>>> "Every jumbled pile of person has a thinking part
> >>>> that wonders what the part that isn't thinking
> >>>> isn't thinking of"
>
> —
> Daniel Lyonshttp://www.storytotell.org-- Tell It!
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To post to this group, send email to clojure@googlegroups.com
To unsubscribe from this group, send email to 
clojure+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to