I'm wondering whether it's technically feasible that any places where such a conversion could fail would be allowed to throw an exception (i.e. internal functions, stream handlers, INI reader, etc...)

At 02:36 PM 4/24/2006, Andrei Zmievski wrote:
So, no particular opinions on this, aside from Markus's? I hoped this proposal would mollify both camps..

-Andrei

On Apr 19, 2006, at 2:32 PM, Andrei Zmievski wrote:

I've had some time to think about this and Derick and I also kicked around some ideas in a private conversation.

The situation I am talking about is really about exceptional circumstances, such as ISO-8859-1 string being treated as a UTF-8 one or some other condition that results in illegal sequences. This is very different from an unassigned character condition, which is handled by SUBST, SKIP, etc callbacks. I disagree with the notion that this is similar to (int)"foo" example. There, we have a well defined semantics that say "strings not starting with a number get converted to 0". Treating ISO-8859-1 data as UTF-8 is simply invalid and bad behavior and should not be encouraged by silently ignoring the conversion error.

Now, I understand that there is resistance to the use of exceptions in this case and I see the point of those who are against them. My problem is this: if we do not throw exceptions, then all we are left with is a warning, which is not helpful if you want to determine in a programmatic fashion whether there was a conversion error. Sure, you can check the return value of unicode_decode(), or maybe even fread() and such, but it does not help with casting, concatenation, and other similar operations. So, we do need a mechanism for this and it has to be a fairly flexible one because libraries may want to do one thing on failure, and application itself -- another.

The best Derick and I could come up with is a user-specified conversion error handler. It would be invoked only when the converter encounters an illegal sequence or other serious error. The existing subst, skip, etc error modes would still apply. The error handler signature would be something like:

function my_handler($direction, $encoding, $string, $char_byte, $offset) { .. }

Where $direction is the direction of conversion (FROM_UNICODE or TO_UNICODE), $encoding is the name of the encoding in use during the attempted conversion, $string is the source string that converter tried to process, $char_byte is either failed Unicode character or byte sequence (depending on direction), and $offset is the offset of that character/byte sequence in the source string. The user error handler then is free to silence the warning, throw an exception (throw UnicodeConversionException($message, $direction, $char_byte, $offset), or do something else. I have no yet decided whether it's a good idea to allow user handler to continue the conversion or not. I'd rather the conversion always stopped.

-Andrei

On Apr 13, 2006, at 4:02 PM, Andi Gutmans wrote:

Yeah but we can't only tailor to the default. If you cast "abc" to an integer today PHP will do the conversion (e.g. 0). I think we should stick to that paradigm and provide users with validation methods if they want to strictly validate...

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to