[PHP-DEV] [PATCH] Bug 43477 - Unicode error mode ignored
Attached is a simple proposed patch that fixes Bug 43477. Basically, the code that set the error mode of the ICU converter was giving it an instruction (the context parameter) to only skip or substitute if the code point was not represented in the new encoding. However, it still was returning an error for illegal sequences. The test suite returns the same results with or without the patch. Test also attached. -Stephen Bach Index: Zend/zend_unicode.c === RCS file: /repository/ZendEngine2/zend_unicode.c,v retrieving revision 1.37 diff -u -r1.37 zend_unicode.c --- Zend/zend_unicode.c 31 Dec 2007 07:12:07 - 1.37 +++ Zend/zend_unicode.c 15 Mar 2008 23:37:36 - @@ -47,16 +47,16 @@ case ZEND_CONV_ERROR_SKIP: if (direction == ZEND_FROM_UNICODE) -ucnv_setFromUCallBack(conv, UCNV_FROM_U_CALLBACK_SKIP, UCNV_SKIP_STOP_ON_ILLEGAL, NULL, NULL, &status); +ucnv_setFromUCallBack(conv, UCNV_FROM_U_CALLBACK_SKIP, NULL, NULL, NULL, &status); else -ucnv_setToUCallBack(conv, UCNV_TO_U_CALLBACK_SKIP, UCNV_SKIP_STOP_ON_ILLEGAL, NULL, NULL, &status); +ucnv_setToUCallBack(conv, UCNV_TO_U_CALLBACK_SKIP, NULL, NULL, NULL, &status); break; case ZEND_CONV_ERROR_SUBST: if (direction == ZEND_FROM_UNICODE) -ucnv_setFromUCallBack(conv, UCNV_FROM_U_CALLBACK_SUBSTITUTE, UCNV_SUB_STOP_ON_ILLEGAL, NULL, NULL, &status); +ucnv_setFromUCallBack(conv, UCNV_FROM_U_CALLBACK_SUBSTITUTE, NULL, NULL, NULL, &status); else -ucnv_setToUCallBack(conv, UCNV_TO_U_CALLBACK_SUBSTITUTE, UCNV_SUB_STOP_ON_ILLEGAL, NULL, NULL, &status); +ucnv_setToUCallBack(conv, UCNV_TO_U_CALLBACK_SUBSTITUTE, NULL, NULL, NULL, &status); break; case ZEND_CONV_ERROR_ESCAPE_UNICODE: --TEST-- Bug #43477 (Unicode error mode) --FILE-- --EXPECT-- unicode(0) "" -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [PATCH] Bug 43477 - Unicode error mode ignored
I'm just suggesting that other error modes should do what they claim to do. Stopping on an illegal sequence is fine, unless the user had called a function telling the converter to do something else. U_CONV_ERROR_STOP: stops on illegal character (the default) U_CONV_ERROR_ESCAPE_*: 5 different modes that escape the illegal sequence in various ways Shouldn't U_CONV_ERROR_SKIP and U_CONV_ERROR_SUBST work the same way? -Stephen On Tuesday 18 March 2008 01:43:13 pm Andrei wrote: > Why would we not want to stop on illegal sequences? > > -Andrei > > Stephen Bach wrote: > > Attached is a simple proposed patch that fixes Bug 43477. Basically, the > > code that set the error mode of the ICU converter was giving it an > > instruction (the context parameter) to only skip or substitute if the > > code point was not represented in the new encoding. However, it still was > > returning an error for illegal sequences. > > > > The test suite returns the same results with or without the patch. Test > > also attached. > > > > -Stephen Bach -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [PATCH] Bug 43477 - Unicode error mode ignored
Sorry for the ambiguity. Allow me to clarify: I meant that U_CONV_ERROR_SKIP and U_CONV_ERROR_SUBST should work the same as the other error modes. Otherwise, what's the point of having them? -Stephen On Tuesday 18 March 2008 06:02:36 pm Geoffrey wrote: > On 18 Mar 2008, at 19:37, Stephen Bach wrote: > > Shouldn't U_CONV_ERROR_SKIP and U_CONV_ERROR_SUBST work the same way? > > I guess: U_CONV_ERROR_SKIP is just U_CONV_ERROR_SUBST with the > substitution string as nothing, though I expect slight speed gains > could be made by keeping them separate (due to no attempt to even add > anything after coming across an invalid sequence — though the speed > gains will be very slight). > > > -- > Geoffrey Sneddon > <http://gsnedders.com/> -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [PATCH] Bug 43477 - Unicode error mode ignored
The original patch works as it should. Substitution with user-defined characters only works in the "FROM_UNICODE" case. When converting to Unicode, U+FFFD is always substituted because it is the standard substitution character. Even the ICU library does not allow this to be changed (with good reason). As for the test, *I think* the testing script would have to be Unicode-compatible and the test script would have to be in some UTF encoding to verify that U+FFFD was substituted. I don't know if this is the case yet. On Friday 21 March 2008 12:06:34 pm you wrote: > On 03/21/2008 06:28 PM, Geoffrey Sneddon wrote: > >> Patch committed, thanks. > > > > Can we test U_CONV_ERROR_SUBST too? See attached patch. Also, the bug > > should be closed. > > The patch breaks the test. > > Can you guys decide on what should work and how, I'll commit the patch > afterwards, ok? -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php