Hi Smit,
I hope you don't mind a couple of comments :)
I had a look at your edit-distance implementation and you've not
followed the recommended approach found in various textbooks
(dynamic-programming) and that's probably why you resorted to memoization...
You see, trying your code without memoization does not perform very
well...In fact, for strings larger than 8-9 characters your code needs
nearly 10 seconds!!! Also, unrestricted memoization is not really a
viable option here. If you were to do any serious work (on a large
corpus for instance), you'll find that your memory will keep increasing
up to the point where your heap will fill up...
So, I can suggest 2 things for you...if you want to keep the 'naive'
levenhstein-distance version, a more clever memoization strategy is
what I 'd go for in order to avoid filling up your cache in a real
program...However, if you want to do it the right way, which will get
rid of your memoization needs, I suggest you look at my hotel_nlp
project, in the 'algorithms' namespace:
https://github.com/jimpil/hotel-nlp/blob/master/src/hotel_nlp/algorithms/levenshtein.clj
One thing that you might find confusing is these 2 let bindings:
m (inc (count s1)) ; do not skip the last character
n (inc (count s2)) ; do not skip the last character
I'm only doing that because my 'matrix' fn uses 'range' which excludes
the upper limit...I guess I could change that to be more evident...
even without memoization, the dynamic-programming solution can do two
16-character-long strings in just over 1ms which is pretty sweet... :)
hope that helps,
Jim
On 27/06/13 05:59, Smit Shah wrote:
Hey folks,
This is my first post on the mailing list, I would like to introduce
myself. I have been love with Lisp since I started reading SICP, the
way you express problems and decompose them it's just amazing. I
always wanted to build a real world application in Lisp but I never
got around it (working in a startup leaves you with very little time).
Recently, I discovered talks by Rich Hickey (thanks to Baishampayan
Ghose!) and I was very impressed. I decided to learn Clojure so I
picked up "Joy of Clojure" and wrote a small library for fuzzy
matching using Levenshtein distance.
So I have an very early version of thefuzzy-matcher
<https://github.com/Who828/fuzzy_matcher>, a library to fuzzily match
strings in Clojure.
Currently it has two APIs,
* edit_distance which gives you Levenshtein distance between two
strings.
* search which takes a string and a list of strings, finally it
gives you a list of strings which sorted by the edit_distance.
I am planning to add more algorithms to this library like Sellers,
Hamming, PairDistance etc. I just wanted feedback from you folks.
I really want to help out the Clojure community by contributing back
and learn lot of new things in the process.
Please if you have any API suggestions or algorithms you want me to
add, let me know. I would do my best to add that feature as soon as
possible.
Regards,
Smit
--
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient
with your first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.