mariolone wrote:
They are successful to extract the matrix.
But with collections of large documents is not one too much expensive
solution?
I have a quite small collection with 14,960 documents and 29,828 unique
terms. If I remember right it took a few minutes on a normal laptop
computer to iterate the terms and documents. I stored the matrix in mySQL:
CREATE TABLE term_document_matrix (
term VARCHAR( 32 ) NOT NULL ,
document INT NOT NULL ,
weight DOUBLE NOT NULL DEFAULT '0',
PRIMARY KEY (term, document)
);
You can see it is not a real matrix just a normal table in the
relational model. I stored the weights greater than 0 only, so I have
much less entries than 14,960 x 29,828 = 446,226,880 (in my case 159,407).
it is possible to extract the matrix from the indexing file?
I don’t know any API to extract the matrix from the index file directly.
Sören
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]