Re: [PHP-DEV] Re: Unicode conversion exceptions and memory leaks

Andrei Zmievski Mon, 24 Apr 2006 14:38:12 -0700

So, no particular opinions on this, aside from Markus's? I hoped thisproposal would mollify both camps..


-Andrei


On Apr 19, 2006, at 2:32 PM, Andrei Zmievski wrote:

I've had some time to think about this and Derick and I also kickedaround some ideas in a private conversation.
The situation I am talking about is really about exceptionalcircumstances, such as ISO-8859-1 string being treated as a UTF-8 oneor some other condition that results in illegal sequences. This isvery different from an unassigned character condition, which ishandled by SUBST, SKIP, etc callbacks. I disagree with the notion thatthis is similar to (int)"foo" example. There, we have a well definedsemantics that say "strings not starting with a number get convertedto 0". Treating ISO-8859-1 data as UTF-8 is simply invalid and badbehavior and should not be encouraged by silently ignoring theconversion error.
Now, I understand that there is resistance to the use of exceptions inthis case and I see the point of those who are against them. Myproblem is this: if we do not throw exceptions, then all we are leftwith is a warning, which is not helpful if you want to determine in aprogrammatic fashion whether there was a conversion error. Sure, youcan check the return value of unicode_decode(), or maybe even fread()and such, but it does not help with casting, concatenation, and othersimilar operations. So, we do need a mechanism for this and it has tobe a fairly flexible one because libraries may want to do one thing onfailure, and application itself -- another.
The best Derick and I could come up with is a user-specifiedconversion error handler. It would be invoked only when the converterencounters an illegal sequence or other serious error. The existingsubst, skip, etc error modes would still apply. The error handlersignature would be something like:
function my_handler($direction, $encoding, $string, $char_byte,$offset) { .. }
Where $direction is the direction of conversion (FROM_UNICODE orTO_UNICODE), $encoding is the name of the encoding in use during theattempted conversion, $string is the source string that convertertried to process, $char_byte is either failed Unicode character orbyte sequence (depending on direction), and $offset is the offset ofthat character/byte sequence in the source string. The user errorhandler then is free to silence the warning, throw an exception (throwUnicodeConversionException($message, $direction, $char_byte, $offset),or do something else. I have no yet decided whether it's a good ideato allow user handler to continue the conversion or not. I'd ratherthe conversion always stopped.
-Andrei

On Apr 13, 2006, at 4:02 PM, Andi Gutmans wrote:
Yeah but we can't only tailor to the default. If you cast "abc" to aninteger today PHP will do the conversion (e.g. 0). I think we shouldstick to that paradigm and provide users with validation methods ifthey want to strictly validate...
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] Re: Unicode conversion exceptions and memory leaks

Reply via email to