Hi,

I downloaded a dataset from UCI repositories named Bag of Words:
http://archive.ics.uci.edu/ml/machine-learning-databases/bag-of-words/readme.txt


The dataset is in a text file with the following structure:
---

docID1 wordID1 count
docID1 wordID2 count
docID1 wordID3 count
docID1 wordID4 count
...
docID2 wordID2 count
docID2 wordID5 count
docID2 wordID6 count
---

Where docIDx is an integer that identifies the document x; wordIDy is
an integer that identifies the word y ; and count is an integer with
the number of times that the wordIDy appears in the docIDx.


Example:

---

1 1 3
1 2 54
1 3 11
1 4 17
2 1 5
2 4 78
2 5 20
---

I would like to import the file into a matrix (not sparse) where:

the wordIDy would correspond to the column [,y]

the docIDx would correspond to the row [x,]

the value in [x,y] would be the count of wordIDy in the docIDx

So, for the previous example it would be like:


    [,1][,2][,3][,4][,5]

[1,]  3   54  11 17   0

[2,]  5    0   0 78  20


I don1t have a clue about how to do this.

Can someone please help me?

Thank you

Rui

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to