I'm wondering whether it's technically feasible that any places where
such a conversion could fail would be allowed to throw an exception
(i.e. internal functions, stream handlers, INI reader, etc...)
At 02:36 PM 4/24/2006, Andrei Zmievski wrote:
So, no particular opinions on this, aside from Markus's? I hoped
this proposal would mollify both camps..
-Andrei
On Apr 19, 2006, at 2:32 PM, Andrei Zmievski wrote:
I've had some time to think about this and Derick and I also kicked
around some ideas in a private conversation.
The situation I am talking about is really about exceptional
circumstances, such as ISO-8859-1 string being treated as a UTF-8
one or some other condition that results in illegal sequences. This
is very different from an unassigned character condition, which is
handled by SUBST, SKIP, etc callbacks. I disagree with the notion
that this is similar to (int)"foo" example. There, we have a well
defined semantics that say "strings not starting with a number get
converted to 0". Treating ISO-8859-1 data as UTF-8 is simply
invalid and bad behavior and should not be encouraged by silently
ignoring the conversion error.
Now, I understand that there is resistance to the use of exceptions
in this case and I see the point of those who are against them. My
problem is this: if we do not throw exceptions, then all we are
left with is a warning, which is not helpful if you want to
determine in a programmatic fashion whether there was a conversion
error. Sure, you can check the return value of unicode_decode(), or
maybe even fread() and such, but it does not help with casting,
concatenation, and other similar operations. So, we do need a
mechanism for this and it has to be a fairly flexible one because
libraries may want to do one thing on failure, and application
itself -- another.
The best Derick and I could come up with is a user-specified
conversion error handler. It would be invoked only when the
converter encounters an illegal sequence or other serious error.
The existing subst, skip, etc error modes would still apply. The
error handler signature would be something like:
function my_handler($direction, $encoding, $string, $char_byte,
$offset) { .. }
Where $direction is the direction of conversion (FROM_UNICODE or
TO_UNICODE), $encoding is the name of the encoding in use during
the attempted conversion, $string is the source string that
converter tried to process, $char_byte is either failed Unicode
character or byte sequence (depending on direction), and $offset is
the offset of that character/byte sequence in the source string.
The user error handler then is free to silence the warning, throw
an exception (throw UnicodeConversionException($message,
$direction, $char_byte, $offset), or do something else. I have no
yet decided whether it's a good idea to allow user handler to
continue the conversion or not. I'd rather the conversion always stopped.
-Andrei
On Apr 13, 2006, at 4:02 PM, Andi Gutmans wrote:
Yeah but we can't only tailor to the default. If you cast "abc" to
an integer today PHP will do the conversion (e.g. 0). I think we
should stick to that paradigm and provide users with validation
methods if they want to strictly validate...
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php