On 01/01/2015 2:43 PM, Mike Miller wrote: > On Thu, 1 Jan 2015, Duncan Murdoch wrote: > >> On 01/01/2015 1:21 PM, Mike Miller wrote: >>> On Thu, 1 Jan 2015, Duncan Murdoch wrote: >>> >>>> On 31/12/2014 8:44 PM, David Winsemius wrote: >>>>> >>>>> On Dec 31, 2014, at 3:24 PM, Mike Miller wrote: >>>>> >>>>>> This is probably a FAQ, and I don't really have a question about it, but >>>>>> I just ran across this in something I was working on: >>>>>> >>>>>>> as.integer(1000*1.003) >>>>>> [1] 1002 >>>>>> >>>>>> I didn't expect it, but maybe I should have. I guess it's about the >>>>>> machine precision added to the fact that as.integer always rounds down: >>>>>> >>>>>> >>>>>>> as.integer(1000*1.003 + 255 * .Machine$double.eps) >>>>>> [1] 1002 >>>>>> >>>>>>> as.integer(1000*1.003 + 256 * .Machine$double.eps) >>>>>> [1] 1003 >>>>>> >>>>>> >>>>>> This does it right... >>>>>> >>>>>>> as.integer( round( 1000*1.003 ) ) >>>>>> [1] 1003 >>>>>> >>>>>> ...but this seems to always give the same answer and it is a little >>>>>> faster in my application: >>>>>> >>>>>>> as.integer( 1000*1.003 + .1 ) >>>>>> [1] 1003 >>>>>> >>>>>> >>>>>> FYI - I'm reading in a long vector of numbers from a text file with no >>>>>> more than three digits to the right of the decimal. I'm converting them >>>>>> to integers and saving them in binary format. >>>>>> >>>>> >>>>> So just add 0.0001 or even .0000001 to all of them and coerce to integer. >>>> >>>> I don't think the original problem was stated clearly, so I'm not sure >>>> whether this is a solution, but it looks wrong to me. If you want to >>>> round to the nearest integer, why not use round() (without the >>>> as.integer afterwards)? Or if you really do want an integer, why add >>>> 0.1 or 0.0001, why not add 0.5 before calling as.integer()? This is the >>>> classical way to implement round(). >>>> >>>> To state the problem clearly, I'd like to know what result is expected >>>> for any real number x. Since R's numeric type only approximates the >>>> real numbers we might not be able to get a perfect match, but at least >>>> we could quantify how close we get. Or is the input really character >>>> data? The original post mentioned reading numbers from a text file. >>> >>> >>> Maybe you'd like to know what I'm really doing. I have 1600 text files >>> each with up to 16,000 lines with 3100 numbers per line, delimited by a >>> single space. The numbers are between 0 and 2, inclusive, and they have >>> up to three digits to the right of the decimal. Every possible value in >>> that range will occur in the data. Some examples numbers: 0 1 2 0.325 >>> 1.12 1.9. I want to multiply by 1000 and store them as 16-bit integers >>> (uint16). >>> >>> I've been reading in the data like so: >>> >>>> data <- scan( file=FILE, what=double(), nmax=3100*16000) >>> >>> At first I tried making the integers like so: >>> >>>> ptm <- proc.time() ; ints <- as.integer( 1000 * data ) ; proc.time()-ptm >>> user system elapsed >>> 0.187 0.387 0.574 >>> >>> I decided I should compare with the result I got using round(): >>> >>>> ptm <- proc.time() ; ints2 <- as.integer( round( 1000 * data ) ) ; >>>> proc.time()-ptm >>> user system elapsed >>> 1.595 0.757 2.352 >>> >>> It is a curious fact that only a few of the values from 0 to 2000 disagree >>> between the two methods: >>> >>>> table( ints2[ ints2 != ints ] ) >>> >>> 1001 1003 1005 1007 1009 1011 1013 1015 1017 1019 1021 1023 >>> 35651 27020 15993 11505 8967 7549 6885 6064 5512 4828 4533 4112 >>> >>> I understand that it's all about the problem of representing digital >>> numbers in binary, but I still find some of the results a little >>> surprising, like that list of numbers from the table() output. For >>> another example: >>> >>>> 1000+3 - 1000*(1+3/1000) >>> [1] 1.136868e-13 >>> >>>> 3 - 1000*(0+3/1000) >>> [1] 0 >>> >>>> 2000+3 - 1000*(2+3/1000) >>> [1] 0 >>> >>> See what I mean? So there is something special about the numbers around >>> 1000. >> >> I think it's really that there is something special about the numbers >> near 1, and you're multiplying that by 1000. >> >> Numbers from 1 to just below 2 are stored as their fractional part, with >> 52 bit precision. Some intermediate calculations will store them with >> 64 bit precision. 52 bits gives about 15 or 16 decimal places. >> >> If your number x is close to 3/1000, it is stored as the fractional part >> of 2^9 * x. This gives it an extra 2 or 3 decimal digits of precision, >> so that's why these values are accurate. >> >> If your number x is close to 2.003, it is stored as the fractional part >> of x/2, i.e. with errors like 1.0015 would have. So I would have >> guessed that 2.006 would have the same problems as 1.003, but I thought >> you didn't see that. So I tried it myself, and I do see that: >> >>> 1000+3 - 1000*(1+3/1000) >> [1] 1.136868e-13 >>> 2000+6 - 1000*(2+6/1000) >> [1] 2.273737e-13 >> >> Reading more closely, I see that you didn't test this particular case, >> so there's no contradiction here. >> >> The one thing I couldn't think of an explanation for is why other >> numbers between 1 and 2 don't have the same sorts of problems. So I >> tried the following: >> >> # Set data to 1.000 thru 1.999 >> data <- 1 + 0:999/1000 >> >> # Find the errors >> errors <- 1000 + 0:999 - 1000*data >> >> # Plot them >> plot(data, errors) >> >> The plot doesn't show a uniform distribution, but much more uniform than >> yours: so I think your data doesn't really cover all possible values >> from 0.000 to 1.999. (I get a similar plot if I look at cases where >> ints != ints2 with my data.) > > No, my data do cover all of the values in the range (I checked that by > listing all of them in a file: > > tr ' ' '\n' < FILE | uniq | sort -n | uniq >| uniq.txt > > They are definitely all there -- all 2001 of them. The thing is, your way > of making the numbers is a little different from reading them from a file. > You construct the number through an arithmetic operation and you get the > error: > >> as.integer( 1000 * (1+118/1000) ) > [1] 1117 > > (Your first error-producing value was 1.118.) But when the number is read > froma file, it starts as character and is translated to numeric. So I > start with the arithmetic, convert to character, back to numeric, and then > I do not see the error: > >> as.integer( 1000 * as.numeric( as.character( 1+118/1000 ) ) ) > [1] 1118 > > That reflects what was happening in my work. To see what I'm seeing, you > have to do this: > > data <- 1 + 0:999/1000 > errors <- 1000 + 0:999 - 1000 * as.numeric( as.character( data ) ) > plot(data, errors)
That's really interesting! I guess the differences basically cancel out. Duncan Murdoch > > Or you could cover he full range from 0 to 2: > > data <- 0:2000/1000 > errors <- 0:2000 - 1000 * as.numeric( as.character( data ) ) > plot(data, errors) > > Mike > ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.