William wrote:
> Hello, I am editing the WordNet http://wordnet.princeton.edu/  dictionary 
> files to add my own words into it. The database file of WordNet look like 
> normal text file and I am editing it with vim, but whenever I add a word it 
> causes the perl "seek" function to work incorrecly.  
> 
> Here are the first *TWO* lines of data.noun file with the parts that I have 
> added on the first line, "entity2 0"
> 00001740 03 n 02 entity 0 entity2 0 003 ~ 00001930 n 0000 ~ 00002137 n
> 0000 ~ 04424418 n 0000 | that which is perceived or known or inferred
> to have its own distinct existence (living or nonliving)
> 00001930 03 n 01 physical_entity 0 007 @ 00001740 n 0000 ~ 00002452 n
> 0000 ~ 00002684 n 0000 ~ 00007347 n 0000 ~ 00020827 n 0000 ~ 00029677 n
> 0000 ~         14580597 n 0000 | an entity that has physical existence
> 
> 
> 
> Happen in
> WordNet::QueryData module at 
> http://search.cpan.org/~jrennie/WordNet-QueryData-1.47/QueryData.pm
> WordNet::QueryData::getSense function , line 612 - 613
> 
> 612: seek $fh, $offset, 0;
> 613: my $line = <$fh>;
> 
> 
> # $fh is the filehandle to data.noun
> #Perl debugger
> DB<51> x $offset
> 0  00001930
> # Here is the part that causes the seek function get the wrong data,
> DB<48> x $line
> 0  'iving)  
> #The $line suppose to be
> 00001930 03 n 01 physical_entity 0 007 @ 00001740 n 0000 ~ 00002452 n
> 0000 ~ 00002684 n 0000 ~ 00007347 n 0000 ~ 00020827 n 0000 ~ 00029677 n
> 0000 ~         14580597 n 0000 | an entity that has physical existence
> 
> 
> With these perl code it's enough to cause such an error
> use WordNet::QueryData;
> my $wn = new WordNet::QueryData;
> print $wn->querySense("entity#n#1","hypo");
> 
> (getSense) Internal error: offset=00001930 pos=n at 
> /usr/local/lib/perl5/site_perl/5.10.0/WordNet/QueryData.pm line 622, <GEN8> 
> line 2.
> 
> 
> Is it because those files are not normal text file ? 
> But according to them 
> What is the format of the WordNet database?
> The (ASCII) database format is well-documented. See WordNet documentation 
> index, specifically WordNet man page: wndb.5WN.
> I had spent more than 24 hours to solve this, but not still not clue, please 
> guide me.
> Thank you very much,

Have you also modified the index.noun file to account for your changes?
index.noun contains a list of byte offsets into data.noun, and any changes to
the latter mean the former is invalid.

Alternatively, I wonder what platform you are working on? Records in the WordNet
files must be terminated by just a single "\x0A". If you are working on a
non-Unix platform that uses a multi-character record separator then the records
will be a different length, so invalidating the index file.

HTH,

Rob

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/


Reply via email to