I have uploaded my solution to the Google group. It seems to work well  
for small directories but runs out of memory pretty quickly on a huge  
directory. I'd appreciate any help with making it more efficient or  
prettier. I'm sure I can drum up my own uses for this.

Also, I thought I could use -> in find-duplicate-files but it didn't  
seem to work out the way I thought it would.

Style hints would be greatly appreciated as well. :)

Thanks,

http://clojure.googlegroups.com/web/file_dup_finder.clj

On May 27, 2009, at 11:51 PM, Timothy Pratley wrote:

>
> Yes that is a very elegant solution.
> For convenience you might want another function:
> (defn fast-compare
>  "Given two filenames returns true if the files are identical"
>  [fn1 fn2]
>  (let [i1 (get-info fn1), i2 (get-info fn2)]
>    (and (= (first i1) (first i2))
>         (= (second i1) (second i2))
>         (= (third i1) (third i2)))))
>
> I wonder if there is a more idiomatic way to compare two lazy
> sequences... lazily?
>
>
> Regarding lazy-hash-map it just allows you to name the fields instead
> of using an index, but I think without is better for something that
> can be expressed like you have. No need for an external dependency.
> (ooops I called it lazy-map before).
>
>
> Regards,
> Tim.
>
> On May 28, 2:34 pm, Mikio Hokari <mikiohok...@gmail.com> wrote:
>> Hash calculation runs only when necessary, because
>> Clojure's map function is lazy now.
>>
>> more sample code:
>>
>> (nth (get-info "a.txt") 0)
>> (nth (get-info "b.txt") 0)
>> (nth (get-info "b.txt") 1)
>>
>> result:
>> size a.txt
>> size b.txt
>> quickhash b.txt
>>
>> Output result shows it.
>> When
>>  (nth (get-info "a.txt") 0)
>> is evaluated, only get-size function runs.
>> Evaluation of get-quickhash and get-hash is delayed.
>>
>> Eval
>>  (nth (get-info "a.txt") 1)
>> cause evaluation of get-quickhash,
>> but not get-hash.
>>
>> 2009/5/28 Timothy Pratley <timothyprat...@gmail.com>:
>>
>>
>>
>>> Sounds like a job for lazy-map to me!
>>> http://kotka.de/projects/clojure/lazy-map.html
>>
>>> On May 28, 11:52 am, Korny Sietsma <ko...@sietsma.com> wrote:
>>>> Hi all,
>>
>>>> I have some ruby code that I'm thinking of porting to clojure,  
>>>> but I'm
>>>> not sure how to translate this idiom to a functional world:
>>>> I have objects that are externally immutable, but have internal
>>>> mutable state they use for optimisation, specifically in this  
>>>> case to
>>>> defer un-needed calculations.
>>
>>>> Basically, I have a FileInfo class that wraps a data file, used to
>>>> compare lots of files on my system.
>>>> It has an "exact_match" method similar to:
>>>>   def exact_match(other)
>>>>      return false if size != other.size
>>>>      return false if quickhash() != other.quickhash()
>>>>      return hash() != other.hash()
>>>>   end
>>
>>>> quickhash and hash store their results in instance variables so  
>>>> they
>>>> only need to do the expensive calculations once - and quite often  
>>>> they
>>>> never need to get calculated at all;  I'm looking for duplicate  
>>>> files,
>>>> but many files have no duplicate, so probably never need to have  
>>>> their
>>>> contents hashed.
>>
>>>> How would I do this in a functional way?  My first effort would be
>>>> something like
>>>>     (defn hash [filename] (memoize (... hash function ...)))
>>>> but I have a couple of problems with this:
>>>>   - it doesn't seem to store the hash value with the rest of the  
>>>> file
>>>> information, which feels a bit ugly
>>>>   - I assume it means storing the full filename three times, once  
>>>> in
>>>> the original file info structure, once in the memoized hash  
>>>> function,
>>>> and once in the memoized quickhash function.  My program  
>>>> struggles to
>>>> get enough RAM to track as many files as I'd like already - storing
>>>> the filename multiple times would blow out memory quite badly.
>>
>>>> I guess I could define a unique key for each filename, and define  
>>>> hash
>>>> as a function on that key, but then hash would need to be able to
>>>> access the list of filenames somehow.  It's starting to get  
>>>> beyond me
>>>> - I'm hoping there's a simpler option!
>>
>>>> Any suggestions?  I'd hope this is not an uncommon idiom.
>>
>>>> - Korny
>>
>>>> --
>>>> Kornelis Sietsma  korny at my surname dot com
>>>> "Every jumbled pile of person has a thinking part
>>>> that wonders what the part that isn't thinking
>>>> isn't thinking of"
> >

—
Daniel Lyons
http://www.storytotell.org -- Tell It!


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To post to this group, send email to clojure@googlegroups.com
To unsubscribe from this group, send email to 
clojure+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to