I still have little ability to predict how these functions will treat the
columns of data frames:

All of this is explained by knowing what class of data functions *work on*, and what class of data *you have*.

# Here's a data frame with a column "a" of integers, # and a column "b" of characters: df <- data.frame(
+ a = 1:2, + b = c("a","b") + )
df
a b 1 1 a 2 2 b

First, let's see what we have?

Use str(df)

str(df)
'data.frame':   2 obs. of  2 variables:
 $ a: int  1 2
 $ b: Factor w/ 2 levels "a","b": 1 2

So we have a data.frame with two variables, one of class integer and one of class factor. Notice how neither are of class character.

# Except -- both columns are characters: apply (df, 2, typeof)
a b "character" "character"

See ?apply. The apply function works on *matrices*. You're not passing it a matrix, you're passing a data.frame. Matrices are two dimensional vectors and are of *ONE* type. So apply could either

1) report an error saying "give me a matrix"

or


2) try to convert whatever you gave it to a matrix.

Apply does (2), and converts it to the best thing it can, a character matrix. It can't be a numeric matrix since you have mixed types of data, so it goes to the "lowest common denominator", a matrix of characters. This is all explained in the first paragraph of ?apply.


# Except -- they're both integers: lapply (df, typeof)
$a [1] "integer" $b [1] "integer"

?typeof is probably not very useful for casual R use. I've never used it. More useful is ?class. ?typeof is showing you how R is storing this stuff low-level. Factors are just integer codes with labels, and you have an integer variable and a factor variable, thus ?typeof reports both integers.

Try lapply(df, class)

# Except -- only one of those integers is numeric: lapply (df, is.numeric)
$a [1] TRUE $b [1] FALSE

Yes, because you have a factor, and in the first 3 paragraphs of ?as.numeric, you'd see:

Factors are handled by the default method, and there
     are methods for classes ‘"Date"’ and ‘"POSIXt"’ (in all three
     cases the result is false).  Methods for ‘is.numeric’ should only
     return true if the base type of the class is ‘double’ or ‘integer’
     _and_ values can reasonably be regarded as numeric (e.g.
     arithmetic on them makes sense).

See, it all makes perfect sense :).

My advice? Don't worry about typeof. *Always* know what class your objects are, and what class the functions you're using expect. Use ?str liberally.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to