mariolone wrote:
They are successful to extract the matrix. But with collections of large documents is not one too much expensive solution?

I have a quite small collection with 14,960 documents and 29,828 unique terms. If I remember right it took a few minutes on a normal laptop computer to iterate the terms and documents. I stored the matrix in mySQL:

CREATE TABLE term_document_matrix (
        term VARCHAR( 32 ) NOT NULL ,
        document INT NOT NULL ,
        weight DOUBLE NOT NULL DEFAULT '0',
        PRIMARY KEY (term, document)
);

You can see it is not a real matrix just a normal table in the relational model. I stored the weights greater than 0 only, so I have much less entries than 14,960 x 29,828 = 446,226,880 (in my case 159,407).

it is possible to extract the matrix from the indexing file?

I don’t know any API to extract the matrix from the index file directly.

Sören

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to