On Jul 8, 2010, at 3:23 PM, Muhammad Rahiz wrote:

Hi all,

I'm trying to filter data into respective numbers. For example, if the data ranges from 0 to <0.1, group the data. And so on for the rest of the data. There are inconsistencies in the output. For example, b1[[3]] lumps all the 0.2s and 0.3s together while 0.6s are not in the output.

Any time you are working with floating point numbers you should be using all.equal rather than ==. You could easily be getting bitten by a test for >= that declares this to be FALSE when you expected it to be TRUE

Running the function - table(f1) - shows that each of the components/ numbers has x number of elements in them. But this is not showing in the results of the script.

Can anyone assist?


Thanks,

Muhammad




f1 <- read.table("data.txt")
f1 <- f1[which(is.na(f1)==FALSE),1]

f1 is a data.frame and "[which( ==FALSE), " is same as "[ !is.na() , " so could use

f1 <- f1[ !is.na(f1[,1]), 1]1]

x0 <- seq(0,1,0.1)
x1 <- x0 +0.1

b1 <- c()
for (a in 1:length(x)){
b1[[a]] <- f1[which(f1 >= x0[a] & f1 < x1[a])]
}

That was really not a minimal example, now was it? Used a very small fraction of your data.

For me this throws an error since x is not defined. Modifying it so x becomes x0 and adding the column number "1" to f1's indexing gets me something like what you are describing. It's undoubtedly a case of FAQ 7.31

> b2 <-findInterval(f1[,1], seq(0, 1, by=0.1) )
> str(b2)
 int [1:120] 11 10 9 10 10 7 10 9 9 7 ...
> table(b2)
b2
 2  3  5  6  7  9 10 11
 1 15 17 56 21  5  4  1
> table(f1[,1])

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9   1
  1   4  11  17  18  38  21   5   4   1

Notice that the 0.5 and 0.6es get lumped into the same box. Methods for discrete variables are more appropriate here. However, if you know that you numbers are all rounded to the nearest tenth, then add (or subtract) 0.05 to your boundary criteria so you won't run into numerical representation problems. (See below. I'm not sure that cut() will solve your troubles here.)

> table(cut(f1[,], seq(0,1,by=0.1) , include.lowest=TRUE, right=FALSE ))

[0,0.1) [0.1,0.2) [0.2,0.3) [0.3,0.4) [0.4,0.5) [0.5,0.6) [0.6,0.7) [0.7,0.8) 0 1 15 0 17 56 21 0
[0.8,0.9)   [0.9,1]
        5         5

Notice the gap in the 0.4 category. This may be why the S/R designers chose to make the default for right=TRUE.


--

David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to