On Jul 17, 2009, at 3:24 PM, Ulrike Grömping wrote:
David,
thanks. Your explanation does not quite fit, though, as it refers to
using
function data.frame, while I assigned the new column with $<-.
poly() does
return an object of classes poly and matrix, not model.matrix,
But model.matrix is not a class as far as I can tell. It has no
"is.<>" function, and examining a sample model matrix does not
indicate that it carries a special class attribute.
and handing a
poly object to function data.frame does behave like I would expect
it to:
dat <- data.frame(X1=1:10, X2=LETTERS[1:10])
dat <- data.frame(dat, X1poly = poly(dat$X1,3))
dat ## five columns displayed
ncol(dat) ## returns 5
colnames(dat) ## returns a vector of 5 names
It is just the assignment with "$" that does behave differently -
and not
only for poly objects but for any matrix object. After I eventually
remembered how to get to the documentation of extractors
(?"$<-.data.frame"), I found this behavior documented there in the
section
on Coercion. Nevertheless, this does seem to contradict the
understanding of
what a data frame is. I am aware that data frames are lists, but
they are of
course special lists, requiring that all list elements have the same
number
of rows. So far I thought that all list elements also have the same
number
of columns, namely just one. In fact, the documentation of function
data.frame states that
"A data frame is a list of variables of the same length with unique
row
names, given class "data.frame".",
which would imply such a rule.
Except that the same page asserts:
"Note that when the replacement value is an array (including a matrix)
it is not treated as a series of columns (as data.frame and
as.data.frame do) but inserted as a single column."
... which is more on point documentation than what I offered earlier.
I also found that the <-I() construct within the data.frame() would
replicate the behavior of df$x<-<mtx> (as was documented in
data.frame's help:
> dat2 <- data.frame(X1=1:10, X2=LETTERS[1:10], X1poly <- I(poly(dat
$X1,3)) )
> length(dat2)
[1] 3
> dat2[1,3]
1 2 3
[1,] -0.4954337 0.522233 -0.4534252
attr(,"class")
[1] "poly" "matrix"
The possibility of a matrix with more than
one column being a column of the data frame contradicts this piece of
documentation, since the length of the matrix is not the same as the
length
of the other columns (e.g. length(poly(dat$X1,3) is 30, not 10 like
for the
other variables). Or would one consider the columns of the matrix
X1poly the
variables, but X1poly a column ? I'm not trying to be difficult, I
just find
this quite confusing and wonder about the consequences when using
such a
data frame in analyses.
The could be unforeseen consequences, but I am not the right person to
answer for all of those possibilities. I can see another instance
where it would be desirable to have tuples included in data.frames as
arrays and that is in the representation of complex numbers, but it
appears that the internal representation of complex numbers is more
completely hidden from casual view than is the capacity of data.frames
to carry matrices. If you have a compelling argument to change the
behavior of [<-.data.frame, you will need to take it up with the
developers.
Best Regards;
David.
Regards, Ulrike
David Winsemius wrote:
Dataframes are lists. Look at dat with str and you will see that the
third column (actually the third list element) is a matrix. It's not
hard to find the documentation. If you read the documentation on the
help page for data.frame you should see this:
"If a list or data frame or matrix is passed to data.frame it is as
if
each component or column had been passed as a separate argument
(except for matrices of class"model.matrix" and those protected by
I)."
It seems reasonable that poly() returns an object that is
considered a
model.matrix.
On Jul 17, 2009, at 12:54 PM, Ulrike Grömping wrote:
Dear UseRs,
I just learnt that the number of columns of a data frame is not
always what
I thought it to be, and I wonder where I should have learnt about
this.
Consider the following example:
dat <- data.frame(X1=1:10, X2=LETTERS[1:10])
ncol(dat) ## evaluates to 2 (of course)
dat$X1poly <- poly(dat$X1,3)
dat ## five columns displayed
ncol(dat) ## evaluates to 3
colnames(dat) ## three names (third is X1poly)
colnames(dat)[3] <- "newname"
dat ## all three previous X1poly columns renamed
This appears intentional, as it treats the column names reasonably.
Where is
it documented ? Are there any other scenarios for which the number
of
columns displayed when printing a data frame does not coincide with
ncol ?
Regards, Ulrike
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.