On Jul 17, 2009, at 3:24 PM, Ulrike Grömping wrote:


David,

thanks. Your explanation does not quite fit, though, as it refers to using function data.frame, while I assigned the new column with $<-. poly() does
return an object of classes poly and matrix, not model.matrix,

But model.matrix is not a class as far as I can tell. It has no "is.<>" function, and examining a sample model matrix does not indicate that it carries a special class attribute.

and handing a
poly object to function data.frame does behave like I would expect it to:

dat <- data.frame(X1=1:10, X2=LETTERS[1:10])
dat <- data.frame(dat, X1poly = poly(dat$X1,3))
dat         ## five columns displayed
ncol(dat)  ## returns 5
colnames(dat) ## returns a vector of 5 names

It is just the assignment with "$" that does behave differently - and not
only for poly objects but for any matrix object. After I eventually
remembered how to get to the documentation of extractors
(?"$<-.data.frame"), I found this behavior documented there in the section on Coercion. Nevertheless, this does seem to contradict the understanding of what a data frame is. I am aware that data frames are lists, but they are of course special lists, requiring that all list elements have the same number of rows. So far I thought that all list elements also have the same number
of columns, namely just one. In fact, the documentation of function
data.frame states that

"A data frame is a list of variables of the same length with unique row
names, given class "data.frame".",

which would imply such a rule.

Except that the same page asserts:

"Note that when the replacement value is an array (including a matrix) it is not treated as a series of columns (as data.frame and as.data.frame do) but inserted as a single column."

... which is more on point documentation than what I offered earlier. I also found that the <-I() construct within the data.frame() would replicate the behavior of df$x<-<mtx> (as was documented in data.frame's help: > dat2 <- data.frame(X1=1:10, X2=LETTERS[1:10], X1poly <- I(poly(dat $X1,3)) )
> length(dat2)
[1] 3
> dat2[1,3]
              1        2          3
[1,] -0.4954337 0.522233 -0.4534252
attr(,"class")
[1] "poly"   "matrix"

The possibility of a matrix with more than
one column being a column of the data frame contradicts this piece of
documentation, since the length of the matrix is not the same as the length of the other columns (e.g. length(poly(dat$X1,3) is 30, not 10 like for the other variables). Or would one consider the columns of the matrix X1poly the variables, but X1poly a column ? I'm not trying to be difficult, I just find this quite confusing and wonder about the consequences when using such a
data frame in analyses.

The could be unforeseen consequences, but I am not the right person to answer for all of those possibilities. I can see another instance where it would be desirable to have tuples included in data.frames as arrays and that is in the representation of complex numbers, but it appears that the internal representation of complex numbers is more completely hidden from casual view than is the capacity of data.frames to carry matrices. If you have a compelling argument to change the behavior of [<-.data.frame, you will need to take it up with the developers.

Best Regards;
David.


Regards, Ulrike


David Winsemius wrote:

Dataframes are lists. Look at dat with str and you will see that the
third column (actually the third list element) is a matrix. It's not
hard to find the documentation. If you read the documentation on the
help page for data.frame you should see this:

"If a list or data frame or matrix is passed to data.frame it is as if
each component or column had been passed as a separate argument
(except for matrices of class"model.matrix" and those protected by I)."

It seems reasonable that poly() returns an object that is considered a
model.matrix.

On Jul 17, 2009, at 12:54 PM, Ulrike Grömping wrote:


Dear UseRs,

I just learnt that the number of columns of a data frame is not
always what
I thought it to be, and I wonder where I should have learnt about
this.
Consider the following example:

dat <- data.frame(X1=1:10, X2=LETTERS[1:10])
ncol(dat)          ## evaluates to 2 (of course)
dat$X1poly <- poly(dat$X1,3)
dat                  ## five columns displayed
ncol(dat)          ## evaluates to 3
colnames(dat)   ## three names (third is X1poly)
colnames(dat)[3] <- "newname"
dat                 ## all three previous X1poly columns renamed

This appears intentional, as it treats the column names reasonably.
Where is
it documented ? Are there any other scenarios for which the number of
columns displayed when printing a data frame does not coincide with
ncol ?

Regards, Ulrike


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to