On 17/12/2010 10:40 AM, (Ted Harding) wrote:
On 17-Dec-10 14:32:18, Gabor Grothendieck wrote:
> Consider this:
>
>> letters[c(2, 3)]
> [1] "b" "c"
>> letters[c(2, NA)]
> [1] "b" NA
>> letters[c(NA, 3)]
> [1] NA "c"
>> letters[c(NA, NA)]
> [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
> NA NA NA
> [26] NA
>
> The result is a 2-vector in each case until we get to c(NA, NA) and
> then it unexpectedly changes from returning a 2-vector to returning a
> 26-vector. I think most people would have expected that the answer
> would be c(NA, NA).
I'm not sure that it is suprising! Consider
letters[NA]
which returns exactly the same result. Then consider that 'letters' is
simply a 26-element character vector c("a",...). Now consider
x<- c(1,2,3,4,5,6,7,8,9,10,11,12,13)
x[NA]
# [1] NA NA NA NA NA NA NA NA NA NA NA NA NA
In other words, x[NA] for any vector x will test each index 1:length(x)
against NA, and will find that it's NA, since it doesn't know whether
the index matches or not. Therefore it returns NA for that index, and
will do the same for every index. So it's telling you: "For each of my
elements a,b,c,d,e,f,... I have to tell you that I don't know whether
you want it or not". You also get similar behavior for x==NA.
If anything might be surprising (though that also admits a logical
explanation), is the result
letters[c(2, NA)]
# [1] "b" NA
since the result being asked for by the first element of c(2,NA) is
definite -- so far so good -- but then you would expect it to have the
same problem with what is being asked for by NA. This time, it seems
that because the 2-element vector c(2,NA) is being submitted, its
length over-rides the length of the response that would be given for
x[NA]: "You asked for a 2-element extraction from letters; I can see
what you want for the first, but not for the second".
However, that logic does not work for letters[c(NA,NA)] which still
returns the 26-element result!
After all that, I'm inclined to the view that letters[NA] should
return one element (NA), letters[c(NA,NA)] should return 2 (NA,NA),
etc.; and that the same should apply to all vectors accessed by [].
The above behaviour seems to contradict [what I can understand from]
what is said in ?"[":
NAs in indexing:
When extracting, a numerical, logical or character 'NA' index
picks an unknown element and so returns 'NA' in the corresponding
element of a logical, integer, numeric, complex or character
result, and 'NULL' for a list. (It returns '00' for a raw
result.]
since that seems to imply that x[c(NA,NA)] should return c(NA,NA)
and not rep(NA,length(x))!
I don't know where that quote came from, but it is not quite relevant
here. The relevant quote is in the Language Definition, talking about
indices by type of index:
"Logical. The indexing i should generally have the same length as x. If
it is shorter, then
its elements will be recycled as discussed in Section 3.3 [Elementary
arithmetic operations],
page 14. If it is longer, then x is conceptually extended with NAs. The
selected values of x
are those for which i is TRUE."
The Introduction to R gets this wrong:
"A logical vector. In this case the index vector must be of the same
length as the vector
from which elements are to be selected. Values corresponding to TRUE in
the index vector
are selected and those corresponding to FALSE are omitted."
The "must" in that quote is too strong; the Language Definition gets it
right. Perhaps the behaviour described in the Intro manual would be
less confusing: letters[c(NA,NA)] would give an error or warning,
something like "logical index of incorrect length". But I suspect
people rely on the recycling of logical vectors, so there'd be a lot of
complaints if we made that change.
Duncan Murdoch
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel