On Fri, 13 Apr 2012 18:09:24 +0200, Stas Malyshev <smalys...@sugarcrm.com> wrote:

There are other situations where the result of the comparison may be
"inaccurate" -- in the sense that two strings may be constructed as
representing different numbers, but they compare equal.

* Comparing two different real numbers that map to the same double
precision number:
var_dump("1.9999999999999999" == "2"); //true

For floats, there's no accurate comparison anyway, it is a known fact.

However, you are not comparing floats, you're comparing strings. As I showed, floats in strings are already treated differently depending on whether they're in string form or not (1e400 == 1e400 vs "1e400" == "1e400"). What's under discussion is once again whether to treat distinctly a proper integer from a integer in string form.

[...]
However, taking the last case an example, this is the same that happens if
you compare:
var_dump((int)"9223372036854775807" == (double)"9223372036854775808");
//true

This, however is a different case since you explicitly coerce the types
and you must know that both conversions are lossy. It's like doing
substr($a, 0, 1) == substr($b, 0, 1) - of course it can return true even
if $a and $b different. When you convert bigger type (string) to smaller
type (int) you must accept the potential loss or check for it if it's
important.
However I think it would make sense not to use this conversion in string
comparisons when we know it's lossy - it seems to be outside of the use
case for such comparisons and it seems apparent by now that it is hard
for people to understand why it works this way.

First, I don't think this discussion gets any clearer by using ambiguous terms such as "lossy" and saying "lossy is bad". Is (int) " 02" a lossy conversion -- you lose the space and 0? What about even (float) "1" -- 1. is mapped from a infinite number of real numbers due to rounding error and you have no way to know which one was the original? And in case, I don't think you mean that (int)"9223372036854775807" is a lossy conversion as it results in 9223372036854775807 (depending on the width of long, of course). (by the way, these are rhetorical questions, I don't care about establishing a definition of "lossy" in this thread)

In any case, your selective quoting destroyed the main point of my e-mail -- that is, this problem implicates these questions: is "9223372036854775808" different from 9223372036854775808? Is "9223372036854775808" still deemed to represent an integer, even though we cannot represent it as an integer type?

I think most people can agree that this behavior is correct:

var_dump(9223372036854775807 == 9223372036854775808); //true

therefore, we need some -- principled -- distinction to treat case "9223372036854775807" == "9223372036854775808" differently. The distinction I propose is answering "yes" to the questions above -- they represent different entities and when no conversion of the integer string to the integer type can't be done we should fall back to memcmp(). This is what is already done with the overflowing "1e400". I don't find it particularly convincing, though.

--
Gustavo Lopes

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to