William wrote: > Hello, I am editing the WordNet http://wordnet.princeton.edu/ dictionary > files to add my own words into it. The database file of WordNet look like > normal text file and I am editing it with vim, but whenever I add a word it > causes the perl "seek" function to work incorrecly. > > Here are the first *TWO* lines of data.noun file with the parts that I have > added on the first line, "entity2 0" > 00001740 03 n 02 entity 0 entity2 0 003 ~ 00001930 n 0000 ~ 00002137 n > 0000 ~ 04424418 n 0000 | that which is perceived or known or inferred > to have its own distinct existence (living or nonliving) > 00001930 03 n 01 physical_entity 0 007 @ 00001740 n 0000 ~ 00002452 n > 0000 ~ 00002684 n 0000 ~ 00007347 n 0000 ~ 00020827 n 0000 ~ 00029677 n > 0000 ~ 14580597 n 0000 | an entity that has physical existence > > > > Happen in > WordNet::QueryData module at > http://search.cpan.org/~jrennie/WordNet-QueryData-1.47/QueryData.pm > WordNet::QueryData::getSense function , line 612 - 613 > > 612: seek $fh, $offset, 0; > 613: my $line = <$fh>; > > > # $fh is the filehandle to data.noun > #Perl debugger > DB<51> x $offset > 0 00001930 > # Here is the part that causes the seek function get the wrong data, > DB<48> x $line > 0 'iving) > #The $line suppose to be > 00001930 03 n 01 physical_entity 0 007 @ 00001740 n 0000 ~ 00002452 n > 0000 ~ 00002684 n 0000 ~ 00007347 n 0000 ~ 00020827 n 0000 ~ 00029677 n > 0000 ~ 14580597 n 0000 | an entity that has physical existence > > > With these perl code it's enough to cause such an error > use WordNet::QueryData; > my $wn = new WordNet::QueryData; > print $wn->querySense("entity#n#1","hypo"); > > (getSense) Internal error: offset=00001930 pos=n at > /usr/local/lib/perl5/site_perl/5.10.0/WordNet/QueryData.pm line 622, <GEN8> > line 2. > > > Is it because those files are not normal text file ? > But according to them > What is the format of the WordNet database? > The (ASCII) database format is well-documented. See WordNet documentation > index, specifically WordNet man page: wndb.5WN. > I had spent more than 24 hours to solve this, but not still not clue, please > guide me. > Thank you very much,
Have you also modified the index.noun file to account for your changes? index.noun contains a list of byte offsets into data.noun, and any changes to the latter mean the former is invalid. Alternatively, I wonder what platform you are working on? Records in the WordNet files must be terminated by just a single "\x0A". If you are working on a non-Unix platform that uses a multi-character record separator then the records will be a different length, so invalidating the index file. HTH, Rob -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/