Re: How to get the number of unique terms in the inverted index

kannan chandrasekaran Thu, 27 May 2010 18:17:34 -0700

I am just trying out a few experiments to calculate similarity between terms 
based on their co-occurences in the dataset...  Basically I am trying to build 
contextual vectors  and calculate similarity using a similarity measure ( say 
cosine similarity).....

I dont think this is an XY problem . The vectors I am trying to build are not 
the same as the TermVectors option ((term,freq) pairs per document) in the 
lucene ( if thats what u meant) 

Thanks
Kannan

________________________________

OK, let's back up a level. WHY are you building these
vectors? Where I'm going with this is I wonder if this
is an XY problem, see:
http://people.apache.org/~hossman/#xyproblem

Best
Erick

On Thu, May 27, 2010 at 7:49 PM, kannan chandrasekaran
<ckanna...@yahoo.com>wrote:

> Uwe,
>
> I now see the problem with overlapping terms across segments...Thanks...
>
> Erik,
>
> Good point...My usecase for this is ,
>
> I am trying to build vectors for individual terms and documents and I need
> to know the size to handle memory constraints
>
> Thanks
> Kannan
>
>
>

Re: How to get the number of unique terms in the inverted index

Reply via email to