While using the Lucene WordNet package, we found that the Syns2Index program indexes the Synsets wrongly. For example, looking up the synsets for the word "king", we get:
java SynLookup wnindex king baron magnate mogul power queen rex scrofula struma tycoon Here, "scrofula" and "struma" are extraneous. This happens because, the line parser code in Syns2Index.java interpretes the two consecutive single quotes in entry s(114144247,3,'king''s evil',n,1,1) in wn_s.pl file, as termination of the string and separates into "king". This entry concerns synset of words "scrofula" and "struma", and thus they get inserted in the synset of "king". *There 1382 such entries, in wn_s.pl* and more in other WordNet Prolog data-base files, where such use of two consecutive single quotes appears. We have resolved this by adding a statement in the line parsing portion of Syns2Index.java, as follows: // parse line line = line.substring(2); * line = line.replaceAll("\'\'", "`"); // added statement* int comma = line.indexOf(','); String num = line.substring(0, comma); ... ... etc. In short we replace "''" by "`" (a back-quote). Then on recreating the index, we get: java SynLookup zwnindex king baron magnate mogul power queen rex tycoon *Recently lucene-2.9.0 has been released, but wordnet package included in it still has the same problem given above.* -- Parag H. Dave