Hi Smit,

I hope you don't mind a couple of comments :)

I had a look at your edit-distance implementation and you've not followed the recommended approach found in various textbooks (dynamic-programming) and that's probably why you resorted to memoization...

You see, trying your code without memoization does not perform very well...In fact, for strings larger than 8-9 characters your code needs nearly 10 seconds!!! Also, unrestricted memoization is not really a viable option here. If you were to do any serious work (on a large corpus for instance), you'll find that your memory will keep increasing up to the point where your heap will fill up...


So, I can suggest 2 things for you...if you want to keep the 'naive' levenhstein-distance version, a more clever memoization strategy is what I 'd go for in order to avoid filling up your cache in a real program...However, if you want to do it the right way, which will get rid of your memoization needs, I suggest you look at my hotel_nlp project, in the 'algorithms' namespace:

 
https://github.com/jimpil/hotel-nlp/blob/master/src/hotel_nlp/algorithms/levenshtein.clj

One thing that you might find confusing is these 2 let bindings:

m (inc (count s1)) ; do not skip the last character
n (inc (count s2)) ; do not skip the last character

I'm only doing that because my 'matrix' fn uses 'range' which excludes the upper limit...I guess I could change that to be more evident...

even without memoization, the dynamic-programming solution can do two 16-character-long strings in just over 1ms which is pretty sweet... :)

hope that helps,

Jim



On 27/06/13 05:59, Smit Shah wrote:
Hey folks,

This is my first post on the mailing list, I would like to introduce myself. I have been love with Lisp since I started reading SICP, the way you express problems and decompose them it's just amazing. I always wanted to build a real world application in Lisp but I never got around it (working in a startup leaves you with very little time).

Recently, I discovered talks by Rich Hickey (thanks to Baishampayan Ghose!) and I was very impressed. I decided to learn Clojure so I picked up "Joy of Clojure" and wrote a small library for fuzzy matching using Levenshtein distance.

So I have an very early version of thefuzzy-matcher <https://github.com/Who828/fuzzy_matcher>, a library to fuzzily match strings in Clojure.

Currently it has two APIs,

  * edit_distance which gives you Levenshtein distance between two
    strings.
  * search which takes a string and a list of strings, finally it
    gives you a list of strings which sorted by the edit_distance.


I am planning to add more algorithms to this library like Sellers, Hamming, PairDistance etc. I just wanted feedback from you folks. I really want to help out the Clojure community by contributing back and learn lot of new things in the process.

Please if you have any API suggestions or algorithms you want me to add, let me know. I would do my best to add that feature as soon as possible.

Regards,
 Smit
--
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



--
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to