Your message dated Sun, 21 Feb 2010 19:16:48 +0100 with message-id <[email protected]> and subject line Re: Bug#534721: libhpricot-ruby1.8: Hpricot's XML parser fails to parse simple, valid XML has caused the Debian Bug report #534721, regarding libhpricot-ruby1.8: Hpricot's XML parser fails to parse simple, valid XML to be marked as done.
This means that you claim that the problem has been dealt with. If this is not the case it is now your responsibility to reopen the Bug report if necessary, and/or fix the problem forthwith. (NB: If you are a system administrator and have no idea what this message is talking about, this may indicate a serious mail system misconfiguration somewhere. Please contact [email protected] immediately.) -- 534721: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=534721 Debian Bug Tracking System Contact [email protected] with problems
--- Begin Message ---Package: libhpricot-ruby1.8 Version: 0.8-2 Severity: grave Justification: renders package unusable This bug also applies to libhpricot-ruby1.9. Problems: - Valid XML is rendered invalid. - XML is no longer parseable. - Invalid XML is not rejected by default (required by the standard). (minor) Workaround: $ aptitude install libhpricot-ruby1.8=0.6-2 Discussion: Closing tags are sometimes not parsed correctly; causing the parser to "helpfully" add closing tags. Whether this happens or not seems to be pseudorandom: $ ruby -e "require 'hpricot'; print Hpricot.XML('<aaaa></aaaa>')" <aaaa></aaaa> $ ruby -e "require 'hpricot'; print Hpricot.XML('<zzzz></zzzz>')" <zzzz></zzzz></zzzz> The effect is similar to the (incorrect) behaviour when it detects malformed XML: $ ruby -e "require 'hpricot'; print Hpricot.XML('<a></b>')" <a></b></a> $ ruby -e "require 'hpricot'; print Hpricot.XML('<a>b')" <a>b</a> The unparsed tag appears to be treated like <zzzz/>: $ ruby -e "require 'hpricot'; print Hpricot.XML('<zzzz></zzzz>').search('/zzzz')" <zzzz></zzzz></zzzz> $ ruby -e "require 'hpricot'; print Hpricot.XML('<zzzz></zzzz>').search('/zzzz/zzzz')" </zzzz> This causes the nesting to break, rendering most XML completely unparseable: $ ruby -e "require 'hpricot'; print Hpricot.XML('<a><zzzz></zzzz><b></b></a>')" <a><zzzz></zzzz><b></b></zzzz></a> $ ruby -e "require 'hpricot'; print Hpricot.XML('<a><zzzz></zzzz><b></b></a>').search('/a/b')" (no output) $ ruby -e "require 'hpricot'; print Hpricot.XML('<a><zzzz></zzzz><b></b></a>').search('/a/zzzz/b')" <b></b> This might be related to how Hpricot treats uncrecognized closing tags. 0.6-2 closes the correct tag, ignoring the contents of the closing tag (this is also invalid behaviour for an XML parse): $ ruby -e "require 'hpricot'; print Hpricot.XML('<a></b>')" <a></a> 0.8-2 is broken as above: $ ruby -e "require 'hpricot'; print Hpricot.XML('<a></b>')" <a></b></a> I suspect the problem is in hpricot_scan.so, but hpricot_scan.c is full of auto-generated code. -- System Information: Debian Release: squeeze/sid APT prefers testing APT policy: (990, 'testing'), (500, 'stable') Architecture: i386 (x86_64) Kernel: Linux 2.6.26-2-amd64 (SMP w/2 CPU cores) Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/bash Versions of packages libhpricot-ruby1.8 depends on: ii libc6 2.9-12 GNU C Library: Shared libraries ii libruby1.8 1.8.7.174-1 Libraries necessary to run Ruby 1. libhpricot-ruby1.8 recommends no packages. libhpricot-ruby1.8 suggests no packages. -- no debconf information
--- End Message ---
--- Begin Message ---Package: libhpricot-ruby1.8 Version: 0.8.1-1 On Fri, Jun 26, 2009 at 06:16:08PM +0100, T Chan wrote: > Closing tags are sometimes not parsed correctly; causing the parser to > "helpfully" add closing tags. Whether this happens or not seems to be > pseudorandom: Hi T Chan, this bug has been fixed upstream in version 0.8.1, which is already in Debian (in fact Debian has a stricter newer version than that); here is the excerpt from upstream changelog about the fix: = 0.8.1 === 3 April, 2009 * solve issue #3 with bogus etags being preserved in `to_s` rather than just `to_original_html`. I've checked that (in a i386 chroot) with most recent unstable version and I confirm the bug is no longer reproducible. I'm therefore closing this bug report. Note that the patch applied by upstream is not the same proposed in the bug log (but I haven't been able to isolate it, given that upstream website seems to be down ATM). Cheers. -- Stefano Zacchiroli -o- PhD in Computer Science \ PostDoc @ Univ. Paris 7 z...@{upsilon.cc,pps.jussieu.fr,debian.org} -<>- http://upsilon.cc/zack/ Dietro un grande uomo c'è ..| . |. Et ne m'en veux pas si je te tutoie sempre uno zaino ...........| ..: |.... Je dis tu à tous ceux que j'aime
--- End Message ---

