I've done quite a lot of work in this area, although not in clojure. As Mark mentioned, wordnet is definitely a good place to start, but it's short on proper nouns, which reduces the utility of this when analyzing natural language. I ended up extending wordnet by data mining wikipedia dumps. The relationship between an article and it's category is essentially the same as a word and it's hypernym. The same is true of redirects and synonyms.
The whole problem is more complex than it appears at first glance because of word senses. As an example, how related are "shot" and "assassinated"? Very, if by shot you mean the past tense of shoot, but not so much if you're referring to a shot of vodka. As far as I know word sense disambiguation is very much an unsolved problem. You'll also want to get a feel for Part of Speech, which is usually a precursor to wsd. It's an interesting problem to solve, and I enjoyed working on it. I don't have any papers handy, but search for deep parsing and semantic similarity in the context of natural language processing and you'll get a feel for stuff. -lance On Jul 28, 2:34 pm, Mark Engelberg <mark.engelb...@gmail.com> wrote: > Wordnet is the main existing thing that comes to mind as related to your > idea. -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en