Re: Lucene & LSA

Soeren Pekrul Thu, 14 Dec 2006 11:17:42 -0800

mariolone wrote:

They are successful to extract the matrix.But with collections of large documents is not one too much expensivesolution?

I have a quite small collection with 14,960 documents and 29,828 uniqueterms. If I remember right it took a few minutes on a normal laptopcomputer to iterate the terms and documents. I stored the matrix in mySQL:


CREATE TABLE term_document_matrix (
        term VARCHAR( 32 ) NOT NULL ,
        document INT NOT NULL ,
        weight DOUBLE NOT NULL DEFAULT '0',
        PRIMARY KEY (term, document)
);

You can see it is not a real matrix just a normal table in therelational model. I stored the weights greater than 0 only, so I havemuch less entries than 14,960 x 29,828 = 446,226,880 (in my case 159,407).

it is possible to extract the matrix from the indexing file?


I don’t know any API to extract the matrix from the index file directly.

Sören

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene & LSA

Reply via email to