So, no particular opinions on this, aside from Markus's? I hoped this
proposal would mollify both camps..
-Andrei
On Apr 19, 2006, at 2:32 PM, Andrei Zmievski wrote:
I've had some time to think about this and Derick and I also kicked
around some ideas in a private conversation.
The situation I am talking about is really about exceptional
circumstances, such as ISO-8859-1 string being treated as a UTF-8 one
or some other condition that results in illegal sequences. This is
very different from an unassigned character condition, which is
handled by SUBST, SKIP, etc callbacks. I disagree with the notion that
this is similar to (int)"foo" example. There, we have a well defined
semantics that say "strings not starting with a number get converted
to 0". Treating ISO-8859-1 data as UTF-8 is simply invalid and bad
behavior and should not be encouraged by silently ignoring the
conversion error.
Now, I understand that there is resistance to the use of exceptions in
this case and I see the point of those who are against them. My
problem is this: if we do not throw exceptions, then all we are left
with is a warning, which is not helpful if you want to determine in a
programmatic fashion whether there was a conversion error. Sure, you
can check the return value of unicode_decode(), or maybe even fread()
and such, but it does not help with casting, concatenation, and other
similar operations. So, we do need a mechanism for this and it has to
be a fairly flexible one because libraries may want to do one thing on
failure, and application itself -- another.
The best Derick and I could come up with is a user-specified
conversion error handler. It would be invoked only when the converter
encounters an illegal sequence or other serious error. The existing
subst, skip, etc error modes would still apply. The error handler
signature would be something like:
function my_handler($direction, $encoding, $string, $char_byte,
$offset) { .. }
Where $direction is the direction of conversion (FROM_UNICODE or
TO_UNICODE), $encoding is the name of the encoding in use during the
attempted conversion, $string is the source string that converter
tried to process, $char_byte is either failed Unicode character or
byte sequence (depending on direction), and $offset is the offset of
that character/byte sequence in the source string. The user error
handler then is free to silence the warning, throw an exception (throw
UnicodeConversionException($message, $direction, $char_byte, $offset),
or do something else. I have no yet decided whether it's a good idea
to allow user handler to continue the conversion or not. I'd rather
the conversion always stopped.
-Andrei
On Apr 13, 2006, at 4:02 PM, Andi Gutmans wrote:
Yeah but we can't only tailor to the default. If you cast "abc" to an
integer today PHP will do the conversion (e.g. 0). I think we should
stick to that paradigm and provide users with validation methods if
they want to strictly validate...
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php