Have a look at Mahout (Lucene sister project), which can create SparseVectors 
from Lucene term vectors where the entries are the term id and the "weight" of 
the term.  Trivial to replicate what is done in Mahout for LibSVM or ARFF or 
whatever.

On Jan 18, 2010, at 9:07 AM, Solt, Illés wrote:

> Hi,
> 
> I am looking for a way to represent term frequency data in a vector space, 
> thus using unique integer identifiers instead of string. This would allow 
> feeding tools like LIBSVM from a Lucene index.
> 
> A small example: TermFreqVector.toString() produces "{TITLE: one/3, two/4}". 
> What I am looking for is "1:3 2:4", where 1 and 2 are arbitrary identifiers, 
> sortedness is not an issue.
> 
> The task can obviously be solved using some java Map, but it should be less 
> efficient then using native Lucene methods.
> 
> I am using 2.9.1, my index can be considered constant.
> 
> 
> Thanks,
> Illes Solt
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: 
http://www.lucidimagination.com/search


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to