On Fri, Sep 16, 2005 at 10:06:40AM +0200, Ron Korving wrote: > Hi, > > I found a bug in DOM. It surprises me that it's never been seen and/or fixed > before. I can't find anything about in the PHP bugtracker anyway. The reason > why I'm posting here and not writing a bugreport, is because I'm not sure if > this is a problem in the PHP-extension or the DOM-library itself. In the > latter case there's nothing anybody here can do, I guess. > > This is the situation: > > <?php > $doc = DOMDocument::loadHTML('<html><body> </body></html>'); > echo "'".$doc->getElementsByTagName('body')->item(0)->textContent."'\n"; > > $doc = DOMDocument::loadHTML('<html><body>foo bar</body></html>'); > echo "'".$doc->getElementsByTagName('body')->item(0)->textContent."'\n"; > ?> > > Output: > > ' ' > 'foo bar' > > Where the heck do these 'Â's come from when it parses an ? I hope > anyone can shed some light on the next step to be taken in order to fix > this.
not a bug. the two bytes 'Â ' are an utf8-encoded nbsp. recode it, or tell your output device to display utf8-encoded strings properly. greetings > Thanks, > > Ron Korving > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: http://www.php.net/unsub.php -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php