Your message dated Sun, 21 Feb 2010 19:16:48 +0100
with message-id <[email protected]>
and subject line Re: Bug#534721: libhpricot-ruby1.8: Hpricot's XML parser fails 
to parse simple, valid XML
has caused the Debian Bug report #534721,
regarding libhpricot-ruby1.8: Hpricot's XML parser fails to parse simple, valid 
XML
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact [email protected]
immediately.)


-- 
534721: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=534721
Debian Bug Tracking System
Contact [email protected] with problems
--- Begin Message ---
Package: libhpricot-ruby1.8
Version: 0.8-2
Severity: grave
Justification: renders package unusable


This bug also applies to libhpricot-ruby1.9.

Problems:
- Valid XML is rendered invalid.
- XML is no longer parseable.
- Invalid XML is not rejected by default (required by the standard). (minor)

Workaround:
  $ aptitude install libhpricot-ruby1.8=0.6-2

Discussion:

Closing tags are sometimes not parsed correctly; causing the parser to 
"helpfully" add closing tags. Whether this happens or not seems to be 
pseudorandom:
  $ ruby -e "require 'hpricot'; print Hpricot.XML('<aaaa></aaaa>')"
  <aaaa></aaaa>
  $ ruby -e "require 'hpricot'; print Hpricot.XML('<zzzz></zzzz>')"
  <zzzz></zzzz></zzzz>

The effect is similar to the (incorrect) behaviour when it detects malformed 
XML:
  $ ruby -e "require 'hpricot'; print Hpricot.XML('<a></b>')"
  <a></b></a>
  $ ruby -e "require 'hpricot'; print Hpricot.XML('<a>b')"
  <a>b</a>

The unparsed tag appears to be treated like <zzzz/>:
  $ ruby -e "require 'hpricot'; print 
Hpricot.XML('<zzzz></zzzz>').search('/zzzz')"
  <zzzz></zzzz></zzzz>
  $ ruby -e "require 'hpricot'; print 
Hpricot.XML('<zzzz></zzzz>').search('/zzzz/zzzz')"
  </zzzz>

This causes the nesting to break, rendering most XML completely unparseable:
  $ ruby -e "require 'hpricot'; print 
Hpricot.XML('<a><zzzz></zzzz><b></b></a>')"
  <a><zzzz></zzzz><b></b></zzzz></a>
  $ ruby -e "require 'hpricot'; print 
Hpricot.XML('<a><zzzz></zzzz><b></b></a>').search('/a/b')"
(no output)
  $ ruby -e "require 'hpricot'; print 
Hpricot.XML('<a><zzzz></zzzz><b></b></a>').search('/a/zzzz/b')"
  <b></b>

This might be related to how Hpricot treats uncrecognized closing tags.
  0.6-2 closes the correct tag, ignoring the contents of the closing tag (this 
is also invalid behaviour for an XML parse):
    $ ruby -e "require 'hpricot'; print Hpricot.XML('<a></b>')"
    <a></a>
  0.8-2 is broken as above:
    $ ruby -e "require 'hpricot'; print Hpricot.XML('<a></b>')"
    <a></b></a>

I suspect the problem is in hpricot_scan.so, but hpricot_scan.c is full of 
auto-generated code.

-- System Information:
Debian Release: squeeze/sid
  APT prefers testing
  APT policy: (990, 'testing'), (500, 'stable')
Architecture: i386 (x86_64)

Kernel: Linux 2.6.26-2-amd64 (SMP w/2 CPU cores)
Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages libhpricot-ruby1.8 depends on:
ii  libc6                        2.9-12      GNU C Library: Shared libraries
ii  libruby1.8                   1.8.7.174-1 Libraries necessary to run Ruby 1.

libhpricot-ruby1.8 recommends no packages.

libhpricot-ruby1.8 suggests no packages.

-- no debconf information



--- End Message ---
--- Begin Message ---
Package: libhpricot-ruby1.8
Version: 0.8.1-1

On Fri, Jun 26, 2009 at 06:16:08PM +0100, T Chan wrote:
> Closing tags are sometimes not parsed correctly; causing the parser to
> "helpfully" add closing tags. Whether this happens or not seems to be
> pseudorandom:

Hi T Chan,
  this bug has been fixed upstream in version 0.8.1, which is already in
Debian (in fact Debian has a stricter newer version than that); here is
the excerpt from upstream changelog about the fix:

  = 0.8.1
  === 3 April, 2009
  * solve issue #3 with bogus etags being preserved in `to_s` rather than
  just `to_original_html`.         

I've checked that (in a i386 chroot) with most recent unstable version
and I confirm the bug is no longer reproducible. I'm therefore closing
this bug report.  Note that the patch applied by upstream is not the
same proposed in the bug log (but I haven't been able to isolate it,
given that upstream website seems to be down ATM).

Cheers.

-- 
Stefano Zacchiroli -o- PhD in Computer Science \ PostDoc @ Univ. Paris 7
z...@{upsilon.cc,pps.jussieu.fr,debian.org} -<>- http://upsilon.cc/zack/
Dietro un grande uomo c'è ..|  .  |. Et ne m'en veux pas si je te tutoie
sempre uno zaino ...........| ..: |.... Je dis tu à tous ceux que j'aime


--- End Message ---

Reply via email to