On Fri, Sep 16, 2005 at 10:06:40AM +0200, Ron Korving wrote:
> Hi,
> 
> I found a bug in DOM. It surprises me that it's never been seen and/or fixed
> before. I can't find anything about in the PHP bugtracker anyway. The reason
> why I'm posting here and not writing a bugreport, is because I'm not sure if
> this is a problem in the PHP-extension or the DOM-library itself. In the
> latter case there's nothing anybody here can do, I guess.
> 
> This is the situation:
> 
> <?php
>   $doc = DOMDocument::loadHTML('<html><body>&nbsp;</body></html>');
>   echo "'".$doc->getElementsByTagName('body')->item(0)->textContent."'\n";
> 
>   $doc = DOMDocument::loadHTML('<html><body>foo&nbsp;bar</body></html>');
>   echo "'".$doc->getElementsByTagName('body')->item(0)->textContent."'\n";
> ?>
> 
> Output:
> 
> 'Â '
> 'foo bar'
> 
> Where the heck do these 'Â's come from when it parses an &nbsp; ? I hope
> anyone can shed some light on the next step to be taken in order to fix
> this.

not a bug. the two bytes 'Â ' are an utf8-encoded nbsp. recode it, or
tell your output device to display utf8-encoded strings properly.

greetings

> Thanks,
> 
> Ron Korving
> 
> -- 
> PHP Internals - PHP Runtime Development Mailing List
> To unsubscribe, visit: http://www.php.net/unsub.php

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to