Hello-

I have rather a messy SPSS file which I have imported to R, I've dput'd some of the columns at the end of this message. I wish to get rid of all the labels and have numeric values using as.numeric. The funny thing is it works like this:

as.numeric(mydata[,2]) # generates correct numbers

however, if I pass the whole dataframe at once like this:

apply(mydata, 1:2, function(x) as.numeric(x))

This same column, column 2, generates NAs with a "in FUN(newX[, i], ...) : NAs introduced by coercion" message.

Meanwhile column 3 works fine like this:

as.numeric(mydata[,3]) # generates correct numbers

And generates numeric results out of the apply function.

I think I basically know why, the str() command tells me that the variables which work okay are "labelled" whereas the ones that don't are "Factor". However, I can't figure out what's special about the apply call that generates the NAs when as.numeric(mydata[,2]) doesn't and I'm not sure what to do about it in future.

I realise I can just loop over the columns, but I would rather get to the bottom of this if I can so I know for future.

Thanks in advance for any advice

Chris Beeley
Institute of Mental Health, UK

dput() gives-

structure(list(id = structure(1:79, label = structure("Participant", .Names = "id"), 
class = "labelled"),
    item2.jan11 = structure(c(4L, 3L, 6L, 4L, 6L, 6L, 2L, 6L,
    2L, 2L, 3L, 3L, 1L, 6L, 2L, 6L, 4L, 2L, 6L, 2L, 6L, 6L, 6L,
    4L, 4L, 6L, 2L, 6L, 2L, 6L, 2L, 3L, 6L, 6L, 3L, 6L, 5L, 6L,
    3L, 6L, 1L, 3L, 3L, 3L, 6L, 4L, 1L, 3L, 6L, 2L, 6L, 2L, 6L,
    6L, 6L, 4L, 3L, 6L, 6L, 6L, 6L, 6L, 3L, 6L, 2L, 6L, 6L, 2L,
    4L, 6L, 2L, 5L, 6L, 6L, 6L, 6L, 1L, 6L, 4L), .Label = c("Not at all",
    "a little", "somewhat", "quite a lot", "very much", "missing data"
    ), class = c("labelled", "factor"), label = structure("The patients care for each 
other", .Names = "item2_jan11")),
    item12.jan11 = structure(c(5L, 5L, 999L, 5L, 999L, 999L,
    2L, 999L, 5L, 2L, 5L, 3L, 3L, 999L, 2L, 999L, 5L, 5L, 999L,
    5L, 999L, 999L, 999L, 5L, 5L, 999L, 3L, 999L, 5L, 999L, 3L,
    4L, 999L, 999L, 4L, 999L, 5L, 999L, 5L, 999L, 3L, 5L, 4L,
    4L, 999L, 3L, 2L, 4L, 999L, 5L, 999L, 5L, 999L, 999L, 999L,
    4L, 5L, 999L, 999L, 999L, 999L, 999L, 4L, 999L, 3L, 999L,
    999L, 1L, 5L, 999L, 3L, 5L, 999L, 999L, 999L, 999L, 4L, 999L,
    0L), value.labels = structure(c(999, 5, 4, 3, 2, 1), .Names = c("missing 
data",
    "very much", "quite a lot", "somewhat", "a little", "Not at all"
    )), label = structure("At times, members of staff are afraid of some of the patients", .Names = 
"item12_jan11"), class = "labelled")), .Names = c("id",
"item2.jan11", "item12.jan11"), class = "data.frame", row.names = c(NA,
-79L))

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to