Got it. Thank you all. On Mon, Mar 16, 2009 at 4:39 PM, Stavros Macrakis <macra...@alum.mit.edu>wrote:
> The factor approach is horrifically ugly and dangerous. > > Even if it didn't have the extraordinarily poor behavior documented > below, it simply isn't well-defined what it should do. The explicit > approximation route is far far preferable in every way: more > predictable, more controllable, and even (though it hardly matters > usually) faster. > > Let's look at the extraordinarily poor behavior I was mentioning. Consider: > > nums <- (.3 + 2e-16 * c(-2,-1,1,2)); nums > [1] 0.3 0.3 0.3 0.3 > > Though they all print as .3 with the default precision (which is > normal and expected), they are all different from .3: > > nums - .3 => -3.885781e-16 -2.220446e-16 2.220446e-16 3.885781e-16 > > When we convert nums to a factor, we get: > > fact <- as.factor(nums); fact > [1] 0.300000000000000 0.3 0.3 0.300000000000000 > Levels: 0.300000000000000 0.3 0.3 0.300000000000000 > > Not clear what the difference between 0.300000000000000 and 0.3 is > supposed to be, nor why some 0.300000000000000 are < .3 and others are > > .3, but let's put that aside for the moment. > > Now let's look at the relations among the factor values: > > fact[1]==fact[2] > [1] FALSE > > fact[1]==fact[4] > [1] TRUE > > So though nums[1] < nums[2] < nums[3] < nums[4], fact[1] compares > *unequal* to fact[2] though it compares *equal* to fact[4]. > Apparently R is comparing the *names* of the levels rather than the > indexes in the factor. This would be weird even if it didn't lead to > this very bad case. > > Hope this helps, > > -s > > > On Mon, Mar 16, 2009 at 6:53 PM, Daniel Murphy <chiefmur...@gmail.com> > wrote: > > I have a matrix whose columns were filled with values which were > functions > > of cvseq<-seq(.2,.3,by=.1) (and a row value of mode integer). To do a > lookup > > for cv=.3 later, I wanted to match(.3,cvseq), which gave me NA, hence my > > question. I thought R would match .3 in cvseq within .Machine$double.eps, > > but I can understand it if .3 and the second element of cvseq would not > have > > identical bits. > > Besides the helpful suggestions below, I also tried > >> cvseqf <- as.factor(cvseq) > >> match(.3,cvseq) > > [1] 2 > > which worked. > > In general, would it be better to go the enumeration route via as.factor > or > > the approximation route? > > Thanks for the help. > > -Dan > > > > On Mon, Mar 16, 2009 at 8:24 AM, Stavros Macrakis <macra...@alum.mit.edu > > > > wrote: > >> > >> Well, first of all, seq(from=.2,to=.3) gives c(0.2), so I assume you > >> really mean something like seq(from=.2,to=.3,by=.1), which gives > >> c(0.2, 0.3). > >> > >> %in% tests for exact equality, which is almost never a good idea with > >> floating-point numbers. > >> > >> You need to define what exactly you mean by "in" for floating-point > >> numbers. What sort of tolerance are you willing to allow? > >> > >> Some possibilities would be for example: > >> > >> approxin <- function(x,list,tol) any(abs(list-x)<tol) # absolute > >> tolerance > >> > >> rapproxin <- function(x,list,tol) (x==0 && 0 %in% list) || > >> any(abs((list-x)/x)<=tol,na.rm=TRUE) > >> # relative tolerance; only exact 0 will match 0 > >> > >> Hope this helps, > >> > >> -s > >> > >> On Mon, Mar 16, 2009 at 9:36 AM, Daniel Murphy <chiefmur...@gmail.com> > >> wrote: > >> > Hello:I am trying to match the value 0.3 in the sequence seq(.2,.3). I > >> > get > >> >> 0.3 %in% seq(from=.2,to=.3) > >> > [1] FALSE > >> > Yet > >> >> 0.3 %in% c(.2,.3) > >> > [1] TRUE > >> > For arbitrary sequences, this "invisible .3" has been problematic. > What > >> > is > >> > the best way to work around this? > > > > > [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel