I am just trying out a few experiments to calculate similarity between terms based on their co-occurences in the dataset... Basically I am trying to build contextual vectors and calculate similarity using a similarity measure ( say cosine similarity).....
I dont think this is an XY problem . The vectors I am trying to build are not the same as the TermVectors option ((term,freq) pairs per document) in the lucene ( if thats what u meant) Thanks Kannan ________________________________ OK, let's back up a level. WHY are you building these vectors? Where I'm going with this is I wonder if this is an XY problem, see: http://people.apache.org/~hossman/#xyproblem Best Erick On Thu, May 27, 2010 at 7:49 PM, kannan chandrasekaran <ckanna...@yahoo.com>wrote: > Uwe, > > I now see the problem with overlapping terms across segments...Thanks... > > Erik, > > Good point...My usecase for this is , > > I am trying to build vectors for individual terms and documents and I need > to know the size to handle memory constraints > > Thanks > Kannan > > >