I've done quite a lot of work in this area, although not in clojure.
As Mark mentioned, wordnet is definitely a good place to start, but
it's short on proper nouns, which reduces the utility of this when
analyzing natural language. I ended up extending wordnet by data
mining wikipedia dumps. The relationship between an article and it's
category is essentially the same as a word and it's hypernym. The same
is true of redirects and synonyms.

The whole problem is more complex than it appears at first glance
because of word senses.  As an example, how related are "shot" and
"assassinated"? Very, if by shot you mean the past tense of shoot, but
not so much if you're referring to a shot of vodka. As far as I know
word sense disambiguation is very much an unsolved problem. You'll
also want to get a feel for Part of Speech, which is usually a
precursor to wsd.

It's an interesting problem to solve, and I enjoyed working on it. I
don't have any papers handy, but search for deep parsing and semantic
similarity in the context of natural language processing and you'll
get a feel for stuff.

-lance


On Jul 28, 2:34 pm, Mark Engelberg <mark.engelb...@gmail.com> wrote:
> Wordnet is the main existing thing that comes to mind as related to your
> idea.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to