On Thu, Jun 27, 2013 at 9:20 PM, Kris Craig <kris.cr...@gmail.com> wrote:
> > > On Thu, Jun 27, 2013 at 7:54 PM, Tjerk Anne Meesters <datib...@php.net>wrote: > >> >> >> >> On Thu, Jun 27, 2013 at 4:42 PM, Kris Craig <kris.cr...@gmail.com> wrote: >> >>> On Thu, Jun 27, 2013 at 12:03 AM, Yasuo Ohgaki <yohg...@ohgaki.net> >>> wrote: >>> >>> > >>> > 2013/6/27 Kris Craig <kris.cr...@gmail.com> >>> > >>> >> I just noticed that htmlspecialchars_decode doesn't convert entities >>> like >>> >> 
 and 
. >>> >> >>> > >>> > I think htmlspecialchars_decode() only decodes >>> > >>> > ext/standard/html_tables.h >>> > static const entity_stage3_row stage3_table_be_apos_00000[] = { >>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { >>> > {NULL, 0} } }, >>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { >>> > {NULL, 0} } }, >>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { >>> > {NULL, 0} } }, >>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { >>> > {NULL, 0} } }, >>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { >>> > {NULL, 0} } }, >>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { >>> > {NULL, 0} } }, >>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { >>> > {NULL, 0} } }, >>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { >>> > {NULL, 0} } }, >>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {"quot", 4} } }, {0, { >>> > {NULL, 0} } }, >>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {"amp", 3} } }, {0, { >>> > {"apos", 4} } }, >>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { >>> > {NULL, 0} } }, >>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { >>> > {NULL, 0} } }, >>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { >>> > {NULL, 0} } }, >>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { >>> > {NULL, 0} } }, >>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { >>> > {NULL, 0} } }, >>> > {0, { {"lt", 2} } }, {0, { {NULL, 0} } }, {0, { {"gt", 2} } }, {0, { >>> > {NULL, 0} } }, >>> > }; >>> > >>> > IIRC >>> > I may be wrong. >>> > >>> > >>> >> Is there a bitmask I'm missing or are those simply not >>> >> supported right now? If the latter, any thoughts on adding something >>> >> along >>> >> the lines of ENT_ALL to convert all valid entities from/to their >>> >> respective >>> >> characters? >>> >> >>> > >>> > What you are looking for is html_entity_decode(), I think. >>> > >>> > $ php -n -r 'var_dump(html_entity_decode(" ="));' >>> > string(2) " >>> > =" >>> > >>> > >>> Yeah I tried html_entity_decode already, but it just returned NULL. On >>> the >>> same input string, htmlspecialchars_decode returned the input string but >>> with *some* special characters decoded; 10 and 13 ("\r\n", I think) were >>> >>> left in their encoded state. I'm not sure why there wouldn't be an >>> option >>> to decode all html special characters. >>> >> >> The html_entity_decode() function shouldn't return NULL, but even an >> empty string sounds like a bug, could you file a report for this and >> provide a reproducible test code? >> > > Yeah I admit it could be an empty string as opposed to NULL. I wasn't > using a var_dump() so I just assumed. > > I'll take another look at it and get those details. > > --Kris > > Ok I've confirmed what's happening. If I include and/or in the string argument passed to html_entities_decode, it returns an empty string, presumably because those entities are not recognized by the function. Here's what the manual says: If the input string contains an invalid code unit sequence within the given > encoding an empty string will be returned, unless either the ENT_IGNORE or > ENT_SUBSTITUTE flags are set. Can somebody explain why ENT_IGNORE isn't enabled by default? What's the use-case for having it return the entire string as empty simply because it contained one or more unrecognized entities? If anything, shouldn't it at least return FALSE instead? I would say that the bug here appears to be the fact that those valid entities are not currently recognized, which makes me curious as to whether or not there might be other valid entities that aren't supported, as well. --Kris