In order to solve the following question,
http://nltk.googlecode.com/svn/trunk/doc/book/ch02.html:
★ Use one of the predefined similarity measures to score the similarity of each of
the following pairs of words. Rank the pairs in order of decreasing similarity. How
close is your ranking to the order given here, an order that was established
experimentally by (Miller & Charles, 1998): car-automobile, gem-jewel,
journey-voyage, boy-lad, coast-shore, asylum-madhouse, magician-wizard,
midday-noon, furnace-stove, food-fruit, bird-cock, bird-crane, tool-implement,
brother-monk, lad-brother, crane-implement, journey-car, monk-oracle,
cemetery-woodland, food-rooster, coast-hill, forest-graveyard, shore-woodland,
monk-slave, coast-forest, lad-wizard, chord-smile, glass-magician, rooster-voyage,
noon-string.
(1) First, I put the word pairs in a list eg.
pairs = [(car, automobile), (gem, jewel), (journey, voyage) ]. According to
http://nltk.googlecode.com/svn/trunk/doc/book/ch02.html, I need to put them in
the following format so as to calculate teh semantic similarity :
wn.synset('right_whale.n.01').path_similarity(wn.synset('minke_whale.n.01')).
In this case, I need to use loop to iterate each element in the above pairs.
How can I refer to each element in the above pairs, i.e. pairs = [(car,
automobile), (gem, jewel), (journey, voyage) ]. What's the index for 'car' and
for 'automobile'? Thanks for your tips.
(2) Since I can't solve the above index issue. I try to use dictionary as
follows:
import nltk
from nltk.corpus import wordnet as wn
pairs = {'car':'automobile', 'gem':'jewel', 'journey':'voyage'}
for key in pairs:
word1 = wn.synset(str(key) + '.n.01')
word2 = wn.synset(str(pairs[key])+'.n.01')
similarity = word1.path_similarity(word2)
print key+'-'+pairs[key],similarity
car-automobile 1.0
journey-voyage 0.25
gem-jewel 0.125
Now it seems that I can calculate the semantic similarity for each groups in
the above dictionary. However, I want to sort according to the similarity value
in the result before print the result out. Can sort dictionary elements
according to their values? This is one of the requirement in this exercise. How
can we make each group of words (e.g. car-automobile, jounrney-voyage,
gem-jewel)
sorted according to their similarity value?
Thanks for your tips.