Thank you for your responses. I guess I don't feel alone. I don't find the documentation go into any detail.
I also find it surprising that, > object.size(train$data) 1730904 bytes > object.size(as.matrix(train$data)) 6575016 bytes the dgCMatrix actually takes less memory, though it *looks* like the opposite. Cheers! On Fri, Oct 20, 2017 at 3:22 PM, David Winsemius <dwinsem...@comcast.net> wrote: > > > On Oct 20, 2017, at 11:11 AM, C W <tmrs...@gmail.com> wrote: > > > > Dear R list, > > > > I came across dgCMatrix. I believe this class is associated with sparse > > matrix. > > Yes. See: > > help('dgCMatrix-class', pack=Matrix) > > If Martin Maechler happens to respond to this you should listen to him > rather than anything I write. Much of what the Matrix package does appears > to be magical to one such as I. > > > > > I see there are 8 attributes to train$data, I am confused why are there > so > > many, some are vectors, what do they do? > > > > Here's the R code: > > > > library(xgboost) > > data(agaricus.train, package='xgboost') > > data(agaricus.test, package='xgboost') > > train <- agaricus.train > > test <- agaricus.test > > attributes(train$data) > > > > I got a bit of an annoying surprise when I did something similar. It > appearred to me that I did not need to load the xgboost library since all > that was being asked was "where is the data" in an object that should be > loaded from that library using the `data` function. The last command asking > for the attributes filled up my console with a 100K length vector (actually > 2 of such vectors). The `str` function returns a more useful result. > > > data(agaricus.train, package='xgboost') > > train <- agaricus.train > > names( attributes(train$data) ) > [1] "i" "p" "Dim" "Dimnames" "x" "factors" > "class" > > str(train$data) > Formal class 'dgCMatrix' [package "Matrix"] with 6 slots > ..@ i : int [1:143286] 2 6 8 11 18 20 21 24 28 32 ... > ..@ p : int [1:127] 0 369 372 3306 5845 6489 6513 8380 8384 10991 > ... > ..@ Dim : int [1:2] 6513 126 > ..@ Dimnames:List of 2 > .. ..$ : NULL > .. ..$ : chr [1:126] "cap-shape=bell" "cap-shape=conical" > "cap-shape=convex" "cap-shape=flat" ... > ..@ x : num [1:143286] 1 1 1 1 1 1 1 1 1 1 ... > ..@ factors : list() > > > Where is the data, is it in $p, $i, or $x? > > So the "data" (meaning the values of the sparse matrix) are in the @x > leaf. The values all appear to be the number 1. The @i leaf is the sequence > of row locations for the values entries while the @p items are somehow > connected with the columns (I think, since 127 and 126=number of columns > from the @Dim leaf are only off by 1). > > Doing this > colSums(as.matrix(train$data)) > cap-shape=bell cap-shape=conical > 369 3 > cap-shape=convex cap-shape=flat > 2934 2539 > cap-shape=knobbed cap-shape=sunken > 644 24 > cap-surface=fibrous cap-surface=grooves > 1867 4 > cap-surface=scaly cap-surface=smooth > 2607 2035 > cap-color=brown cap-color=buff > 1816 > # now snipping the rest of that output. > > > > Now this makes me think that the @p vector gives you the cumulative sum of > number of items per column: > > > all( cumsum( colSums(as.matrix(train$data)) ) == train$data@p[-1] ) > [1] TRUE > > > > > Thank you very much! > > > > [[alternative HTML version deleted]] > > Please read the Posting Guide. Your code was not mangled in this instance, > but HTML code often arrives in an unreadable mess. > > > > > ______________________________________________ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posti > ng-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > David Winsemius > Alameda, CA, USA > > 'Any technology distinguishable from magic is insufficiently advanced.' > -Gehm's Corollary to Clarke's Third Law > > > > > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.