Re: [R] tapply() and using factor() on a factor

Alexander Peterhansl Fri, 16 Oct 2009 08:34:07 -0700

Thank you Mohamed and Bill for your replies.  (I did not send the data
because it is unwieldy.)

Yes Bill, the issue arises directly from what you had guessed.  I was
working with a subset of the data (which implicitly had factors for the
complete data set).

On this, what is the best way take a subset of the data which ignores
these "extraneous" factors?

> log<-data.frame(Flag=1:2,
RequestID=factor(letters[1:2],levels=letters[1:10]))
> log2 <-subset(log, RequestID=="a")

> levels(log2$RequestID)
 [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"

In other words, how do I take a subset which yields "a" as the only
level for log2?

Alex




-----Original Message-----
From: William Dunlap [mailto:wdun...@tibco.com] 
Sent: Thursday, October 15, 2009 11:59 PM
To: Alexander Peterhansl; r-help@r-project.org
Subject: RE: [R] tapply() and using factor() on a factor

> -----Original Message-----
> From: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] On Behalf Of Alexander 
> Peterhansl
> Sent: Thursday, October 15, 2009 2:50 PM
> To: r-help@r-project.org
> Subject: [R] tapply() and using factor() on a factor
> 
> Dear List,
> 
>  
> 
> Shouldn't result1 and result2 be equal in the following case?
> 
>  
> 
> Note that log$RequestID is a factor.  That is, 
> is.factor(log$RequestID)
> yields TRUE.
> 
>  
> 
> result1 <- tapply(log$Flag,factor(log$RequestID),sum)
> 
> result2 <- tapply(log$Flag,log$RequestID,sum)

Showing us the output of dput(log) (or str(log) and summary(log))
would let people discover the problem more readily.  Since you
didn't I'll guess what the dataset may contain.

If log$RequestID is a factor with lots of unused levels tapply
will output an NA for each unused level.  factor(log$RequestID)
will create a new set of levels, only those actually used,
so tapply will not be forced to fill those spots with NA's.  E.g.,

> log<-data.frame(Flag=1:2, RequestID=factor(letters[1:2],
levels=letters[1:10]))
> tapply(log$Flag, log$RequestID, sum)
 a  b  c  d  e  f  g  h  i  j
 1  2 NA NA NA NA NA NA NA NA
> tapply(log$Flag, factor(log$RequestID), sum)
a b
1 2

I suppose tapply(X,INDEX,FUN) could call FUN(X[0]) to see
how to fill the cells with no data behind them, but it doesn't.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

> 
>  
> 
> Yet, when I summarize the output, I get the following:
> 
> summary(result1)
> 
>    Min.    1st Qu.  Median  Mean 3rd Qu.    Max. 
> 
>   11.00   11.00     11.00      26.06   11.00       101.00
> 
>  
> 
> summary(result2)
> 
>    Min. 1st Qu.  Median Mean 3rd Qu.    Max.    NA's 
> 
>   11.00   11.00   11.00        26.06   11.00  101.00   978.00
> 
>  
> 
> Why does result2 have 978 NA's?
> 
>  
> 
> Any help on this would be appreciated.
> 
>  
> 
> Alex
> 
>  
> 
>  
> 
>  
> 
>  
> 
> 
>       [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] tapply() and using factor() on a factor

Reply via email to