Titus von der Malsburg wrote:
On Tue, Jun 09, 2009 at 11:23:36AM +0200, ONKELINX, Thierry wrote:
For factors, you better convert them first back to character strings.

  splice <- function(x, y) {
        x <- levels(x)[x]
        y <- levels(y)[y]
        factor(as.vector(rbind(x, y)))
}

Thank you very much, Thierry!

I failed to mention something important in my last mail: x and y have
the same levels.  (I assume that the integer to level name mapping of
a factor defines its class and that it only makes sense to combine
factors of the same class.)

Say

    > x <- factor(c(2,2,4,4), levels=1:4, labels=c("a","b","c","d"))

then

    > x
    [1] b b d d
    Levels: a b c d

    > as.integer(x)
    [1] 2 2 4 4

but

    > splice(x,x)
    [1] b b b b d d d d
    Levels: b d

    > as.integer(splice(x,x))
    [1] 1 1 1 1 2 2 2 2

I'd like to have a splice function that retains the level to label
mapping.  One candidate for a solution is:

    splice <- function(x,y) {
      xy <- as.vector(rbind(x, y))
      if (is.factor(x) && is.factor(y))
        xy <- factor(xy, levels=1:length(levels(x)), labels=levels(x))
      xy
    }

However, this relies on assumtions about the implementation of
factors that are neither mentioned nor guaranteed in the man page:
Levels are underlyingly integers starting from one and going to
length(levels).  levels(x) gives me the labels of these integers in an
order corresponding to 1:length(levels(x)).

Without these assumptions I see no way to recover the integer to level
name mapping for levels that are defined in a factor but do not occur.

I'd be happy if somebody could clarify this issue!

Hm, well,... Some people have been quite insistent that factors should be though of as isomorphic to vectors over small subsets of character strings and not as isomorphic to small integers with labels. I tend to disagree as it creates more complications than it solves.

Anyways, I would do it like this (generalizing "8" and the seq() bits is left as an exercise)

> x <- factor(c(2,2,4,4), levels=1:4, labels=c("a","b","c","d"))
> xx <- factor(rep(NA,8),levels=levels(x))
> xx[seq(1,8,2)]<-x
> xx[seq(2,8,2)]<-x
> xx
[1] b b b b d d d d
Levels: a b c d
> as.integer(xx)
[1] 2 2 2 2 4 4 4 4



  Titus

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
   O__  ---- Peter Dalgaard             Ă˜ster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalga...@biostat.ku.dk)              FAX: (+45) 35327907

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to