Hi, I downloaded a dataset from UCI repositories named Bag of Words: http://archive.ics.uci.edu/ml/machine-learning-databases/bag-of-words/readme.txt
The dataset is in a text file with the following structure: --- docID1 wordID1 count docID1 wordID2 count docID1 wordID3 count docID1 wordID4 count ... docID2 wordID2 count docID2 wordID5 count docID2 wordID6 count --- Where docIDx is an integer that identifies the document x; wordIDy is an integer that identifies the word y ; and count is an integer with the number of times that the wordIDy appears in the docIDx. Example: --- 1 1 3 1 2 54 1 3 11 1 4 17 2 1 5 2 4 78 2 5 20 --- I would like to import the file into a matrix (not sparse) where: the wordIDy would correspond to the column [,y] the docIDx would correspond to the row [x,] the value in [x,y] would be the count of wordIDy in the docIDx So, for the previous example it would be like: [,1][,2][,3][,4][,5] [1,] 3 54 11 17 0 [2,] 5 0 0 78 20 I don1t have a clue about how to do this. Can someone please help me? Thank you Rui [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.