On Mon, 28 Apr 2014 12:00:23 -0400, Roy Smith wrote: [...] > Fundamentally, these numbers have between 0 and 4 decimal digits of > precision,
I'm surprised that you have a source of data with variable precision, especially one that varies by a factor of TEN THOUSAND. The difference between 0 and 4 decimal digits is equivalent to measuring some lengths to the nearest metre, some to the nearest centimetre, and some to the nearest 0.1 of a millimetre. That's very unusual and I don't know what justification you have for combining such a mix of data sources. One possible interpretation of your post is that you have a source of floats, where all the numbers are actually measured to the same precision, and you've simply misinterpreted the fact that some of them look like they have less precision. Since you indicate that 4 decimal digits is the maximum, I'm going with 4 decimal digits. So if your data includes the float 23.5, that's 23.5 measured to a precision of four decimal places (that is, it's 23.5000, not 23.5001 or 23.4999). On the other hand, if you're getting your values as *strings*, that's another story. If you can trust the strings, they'll tell you how many decimal places: "23.5" is only one decimal place, "23.5000" is four. But then what to make of your later example? > 40.75280000000001 ==> 4 Python floats (C doubles) are quite capable of distinguishing between 40.7528 and 40.75280000000001. They are distinct numbers: py> 40.75280000000001 - 40.7528 7.105427357601002e-15 so if a number is recorded as 40.75280000000001 presumably it is because it was measured as 40.75280000000001. (How that precision can be justified, I don't know! Does it come from the Large Hadron Collider?) If it were intended to be 40.7528, I expect it would have be recorded as 40.7528. What reason do you have to think that something recorded to 14 decimal places was only intended to have been recorded to 4? Without knowing more about how your data is generated, I can't advise you much, but the whole scenario as you have described it makes me think that *somebody* is doing something wrong. Perhaps you need to explain why you're doing this, as it seems numerically broken. > Is there any clean way to do that? The best I've come up with so far is > to str() them and parse the remaining string to see how many digits it > put after the decimal point. I really think you need to go back to the source. Trying to infer the precision of the measurements from the accident of the string formatting seems pretty dubious to me. But I suppose if you wanted to infer the number of digits after the decimal place, excluding trailing zeroes (why, I do not understand), up to a maximum of four digits, then you could do: s = "%.4f" % number # rounds to four decimal places s = s.rstrip("0") # ignore trailing zeroes, whether significant or not count = len(s.split(".")[1]) Assuming all the numbers fit in the range where they are shown in non- exponential format. If you have to handle numbers like 1.23e19 as well, you'll have to parse the string more carefully. (Keep in mind that most floats above a certain size are all integer-valued.) > The numbers are given to me as Python floats; I have no control over > that. If that's the case, what makes you think that two floats from the same data set were measured to different precision? Given that you don't see strings, only floats, I would say that your problem is unsolvable. Whether I measure something to one decimal place and get 23.5, or four decimal places and get 23.5000, the float you see will be the same. Perhaps you ought to be using Decimal rather than float. Floats have a fixed precision, while Decimals can be configured. Then the right way to answer your question is to inspect the number: py> from decimal import Decimal as D py> x = D("23.5000") py> x.as_tuple() DecimalTuple(sign=0, digits=(2, 3, 5, 0, 0, 0), exponent=-4) The number of decimal digits precision is -exponent. > I'm willing to accept that fact that I won't be able to differentiate > between float("38.0") and float("38.0000"). Both of those map to 1, > which is OK for my purposes. That seems... well, "bizarre and wrong" are the only words that come to mind. If I were recording data as "38.0000" and you told me I had measured it to only one decimal place accuracy, I wouldn't be too pleased. Maybe if I understood the context better? How about 38.12 and 38.1200? By the way, you contradict yourself here. Earlier, you described 38.0 as having zero decimal places (which is wrong). Here you describe it as having one, which is correct, and then in a later post you describe it as having zero decimal places again. -- Steven D'Aprano http://import-that.dreamwidth.org/ -- https://mail.python.org/mailman/listinfo/python-list