tags 249234 patch thanks Hi,
here's a patch (attached) that makes libxtm-perl 0.36 installable in sid again. I found four issues that needed fixing: #1: XTM::XML::MemoryBuilder incompatibility with the current version of libxml-libxml-perl #2: unicode encoding issues in t/04astma.t and t/09ltm.t #3: XTM::Path bug that was triggered by the perl version switch #4: incompatibility with the current version of libparse-recdescent-perl in XTM::Path, XTM::AsTMa::Parser and XTM::LTM::Parser I'll explain the issues below; apologies for the length of the mail :) Issue #1 This shows up as test failures in t/03xml.t. The failed tests try to make sure that an exception is raised when XTM::XML::MemoryBuilder parses an invalid XML document. The tests fail because the default SAX parser offered by the current libxml-libxml-perl version, XML::LibXML::SAX, doesn't raise exceptions by default, although other SAX parsers do. I have filed bug #297885 against libxml-libxml-perl about this. The workaround in the patch is to define an explicit error method in XTM::XML::MemoryBuilder. Issue #2 This has to do with the much improved Unicode handling in Perl 5.8. The tests in t/04astma.t t/09ltm.t are broken, and there is no sense in trying to get them to work under Perl 5.8. In t/04astma.t, the tests try to make sure that 8-bit characters from different encodings end up as UTF-8 inside the XTM::AsTMa module. The test is broken because it uses the \x{C3A4} notation, which even in Perl 5.6 stood for 'the Unicode character C3A4'. This seems to be some short of an oriental glyph in my font, and has nothing to do with the Unicode presentation of "LATIN SMALL LETTER A WITH DIAERESIS". The tests happen to work in Perl 5.6 because it cannot really tell the difference between \x{C3}\x{A4} and \x{C3A4}. Perl 5.8 can. The fix is to use the former notation (\x{C3}\x{A4}) in the tests, as it works for both 5.6 and 5.8. In t/09ltm.t the encoding tests relies on another Perl 5.6 peculiarity: under "use utf8" the \x{E4} notation inside a regexp seems to mean "the Unicode character E4 (LATIN SMALL LETTER A WITH DIAERESIS) encoded in UTF-8". In Perl 5.8 the "use utf8" doesn't seem to affect the \x{} notation, so \x{E4} just means "the Unicode character E4". Here the fix is to remove the "use utf8" and match with the explicit UTF-8 coding \x{C3}\x{A4}. Note that fixing just the tests means that the XTM modules don't make any use of Perl 5.8 Unicode features. The characters are still stored as their UTF-8 encoded byte sequences, not in the Perl 5.8 internal Unicode representation. It would be quite easy to decode them with Encode::decode() while parsing, but this wouldn't work at all with Perl 5.6. Thus I have opted for what's IMO the least invasive and most compatible solution. Issue #3 This shows up as a test failure in t/08topic.t while calling XTM::Path::create(). AIUI, the code creates nodes in __create() by traversing the grammar tree from the leafs upwards, using the path looked up from XTM::grammar. The problem is that the code isn't prepared for the situation where a node has several possible parents (as text() does), and it blindly picks the first one instead of looking at the path to determine the correct one. Furthermore, the order of the parents is undefined: it is originally initialized in XTM::grammar::recompute_isin() by turning "keys %$adjacency" into an array without any sorting. In Perl 5.6 the right parent happens to be the first, but in Perl 5.8 the order of the list is changed. The bug can be triggered under Perl 5.6 by sorting the elements explicitly in the "wrong" order with "reverse sort keys %$adjacency". The patch fixes this bug by picking the parent that exists in the path. If none or several parents exist in the path, it falls back to the old behaviour. I think this shouldn't ever happen, but I wasn't sure enough to make the code raise an exception. I hope this makes sense - the solution seems right to me, but it should probably be checked with upstream :) Issue #4 This showed up after modifying lib/XTM/Path.pm and suddenly getting dozens of test errors. Apparently the three CParser.pm files in the tree are automatically created by Parse::RecDescent, and the grammars aren't compatible with the current Parse::RecDescent version. The corresponding entry in Parse::RecDescent upstream changelog is this: 1.90 Tue Mar 25 01:17:38 2003 - BACKWARDS INCOMPATIBLE CHANGE: The key of an %item entry for a repeated subrule now includes the repetition specifier. For example, in: sentence: subject verb word(s) the various matched items will be stored in $item{'subject'}, $item{'verb'}, and $item{'word(s)'} (i.e. *not* in $item{'word'}, as it would have been in previous versions of the module). (thanks Anthony) The patch fixes this in all the three grammars: XTM::Path, XTM::LTM::Parser and XTM::AsTMa::Parser. They now require at least version 1.90 of Parse::RecDescent. I hope I found all the occurrences; at least the tests are now successful with regenerated CParser.pm files as well. I have also fixed the Depends and Conflicts lines in debian/control, so the package doesn't conflict with Perl 5.8 anymore. I have run the tests succesfully with both Perl 5.6 and Perl 5.8. I hope this helps libxtm-perl get back into sarge, but don't personally mind too much if it doesn't. I don't use the package myself, and originally looked into the problems as an opportunity to learn how Perl 5.8 handles Unicode. Then I just wanted to finish what I had started, as it seemed like a nice challenge :) Cheers, -- Niko Tyni [EMAIL PROTECTED]
--- debian/control 2005/03/03 10:29:42 1.1 +++ debian/control 2005/03/03 10:37:46 @@ -2,14 +2,12 @@ Section: perl Priority: extra Maintainer: Alexander Zangerl <[EMAIL PROTECTED]> -Build-Depends-Indep: debhelper (>> 3.0.0), perl, libtest-simple-perl (>= 0.40), libxml-sax-perl (>= 0.03), libxml-libxml-perl (>= 1.40), libio-string-perl (>= 1.01), libwww-perl, libparse-recdescent-perl (>= 1.80), liburi-perl (>= 1.18), libxml-twig-perl (>= 3.01), libxml-writer-perl (>= 0.4), libtext-iconv-perl (>= 1.2), libfile-slurp-perl -Build-Conflicts-Indep: perl (>= 5.8) +Build-Depends-Indep: debhelper (>> 3.0.0), perl, perl-modules (>= 5.8.0) | libtest-simple-perl (>= 0.40), libxml-sax-perl (>= 0.03), libxml-libxml-perl (>= 1.40), libio-string-perl (>= 1.01), libwww-perl, libparse-recdescent-perl (>= 1.90), liburi-perl (>= 1.18), libxml-twig-perl (>= 3.01), libxml-writer-perl (>= 0.4), libtext-iconv-perl (>= 1.2), libfile-slurp-perl Standards-Version: 3.6.1.0 Package: libxtm-perl Architecture: all -Depends: ${perl:Depends}, libxml-sax-perl (>= 0.03), libxml-libxml-perl (>= 1.40), libio-string-perl (>= 1.01), libwww-perl, libparse-recdescent-perl (>= 1.80), liburi-perl (>= 1.18), libxml-twig-perl (>= 3.01), libxml-writer-perl (>= 0.4), libtext-iconv-perl (>= 1.2), libfile-slurp-perl -Conflicts: perl (>= 5.8) +Depends: ${perl:Depends}, libxml-sax-perl (>= 0.03), libxml-libxml-perl (>= 1.40), libio-string-perl (>= 1.01), libwww-perl, libparse-recdescent-perl (>= 1.90), liburi-perl (>= 1.18), libxml-twig-perl (>= 3.01), libxml-writer-perl (>= 0.4), libtext-iconv-perl (>= 1.2), libfile-slurp-perl Description: Perl module for reading/writing Topic Maps This module consists of several classes for reading, querying and building Topic Maps in both standard XTM (XML Topic Map) format as well as --- lib/XTM/AsTMa/Parser.pm 2005/03/03 10:29:16 1.1 +++ lib/XTM/AsTMa/Parser.pm 2005/03/03 10:29:33 @@ -11,7 +11,7 @@ $VERSION = '0.08'; use Data::Dumper; -use Parse::RecDescent; +use Parse::RecDescent 1.90; use URI; use URI::Escape; @@ -40,14 +40,14 @@ # deal with the topic first my $t = new XTM::topic (id => $item{topic_id}); - foreach (@{$item{types}->[0]}) { + foreach (@{$item{'types(?)'}->[0]}) { $t->add__s (new XTM::instanceOf ( reference => new XTM::topicRef (href => "#$_"))); } $t->add__s (new XTM::instanceOf ( reference => new XTM::topicRef (href => $XTM::PSI::xtm{topic}))) unless $t->instanceOfs && @{$t->instanceOfs}; my $s = new XTM::subjectIdentity (); # maybe we need it - foreach (@{$item{topic_characteristic}}) { + foreach (@{$item{'topic_characteristic(s?)'}}) { if (ref($_) eq 'XTM::subjectIndicatorRef') { $s->add_reference_s ($_); } elsif (ref($_) eq 'XTM::topicRef') { @@ -56,8 +56,8 @@ $t->add__s ($_); } } - if (ref($item{reification}) eq 'ARRAY' && @{$item{reification}}) { - $s->add_ ( $item{reification}->[0] ); + if (ref($item{'reification(?)'}) eq 'ARRAY' && @{$item{'reification(?)'}}) { + $s->add_ ( $item{'reification(?)'}->[0] ); } $t->add_subjectIdentity ($s) if $s->references || $s->resourceRef; # only add it if we found at least one reference @@ -101,7 +101,7 @@ return $t; } - foreach my $uri (@{$item{isreification}}) { + foreach my $uri (@{$item{'isreification(s?)'}}) { push @components, _make_reifying_topic ($uri, $t->id); } @@ -126,7 +126,7 @@ my $a = new XTM::association (); my $s = new XTM::scope(); $a->add_scope ($s); - foreach (@{$item{scopes}->[0]}) { + foreach (@{$item{'scopes(?)'}->[0]}) { $s->add_reference_s (new XTM::topicRef (href => "#$_")); } $a->scope->add_reference_s (new XTM::topicRef (href => $XTM::PSI::xtm{universal_scope}) ) @@ -134,13 +134,13 @@ $a->add_instanceOf (new XTM::instanceOf (reference => new XTM::topicRef (href => "#$item{type_topic_id}"))); - foreach (@{$item{association_member}}) { + foreach (@{$item{'association_member(s)'}}) { $a->add__s ($_); } push @components, $a; - foreach my $uri (@{$item{isreification}}) { + foreach my $uri (@{$item{'isreification(s?)'}}) { push @components, _make_reifying_topic ($uri, $a->id); } @@ -175,7 +175,7 @@ my $b = new XTM::baseName (); $b->add_baseNameString (new XTM::baseNameString (string => $item{string})); $b->add_scope (new XTM::scope()); - foreach (@{$item{scopes}->[0]}) { + foreach (@{$item{'scopes(?)'}->[0]}) { $b->scope->add_reference_s (new XTM::topicRef (href => "#$_")); } $b->scope->add_reference_s (new XTM::topicRef (href => $XTM::PSI::xtm{universal_scope}) ) @@ -188,14 +188,14 @@ my $o = new XTM::occurrence (); $o->add_resource (new XTM::resourceRef (href => $item{string})); $o->add_scope (new XTM::scope()); - foreach (@{$item{scopes}->[0]}) { + foreach (@{$item{'scopes(?)'}->[0]}) { $o->scope->add_reference_s (new XTM::topicRef (href => "#$_")); } $o->scope->add_reference_s (new XTM::topicRef (href => $XTM::PSI::xtm{universal_scope}) ) unless $o->scope->references; $o->add_instanceOf (new XTM::instanceOf ( reference => new XTM::topicRef (href => - $item{type} && $item{type}->[0] ? "#$item{type}->[0]" : $XTM::PSI::xtm{occurrence} + $item{'type(?)'} && $item{'type(?)'}->[0] ? "#$item{'type(?)'}->[0]" : $XTM::PSI::xtm{occurrence} ))); $return = $o; } @@ -204,16 +204,16 @@ { my $o = new XTM::occurrence (); # fixme: need better way to figure out whether unescape needed; now looks for %0A at end.... - $o->add_resource (new XTM::resourceData (data => (@{$item{colon}}==2?URI::Escape::uri_unescape($item{string}):$item{string}))); + $o->add_resource (new XTM::resourceData (data => (@{$item{'colon(1..2)'}}==2?URI::Escape::uri_unescape($item{string}):$item{string}))); $o->add_scope (new XTM::scope()); - foreach (@{$item{scopes}->[0]}) { + foreach (@{$item{'scopes(?)'}->[0]}) { $o->scope->add_reference_s (new XTM::topicRef (href => "#$_")); } $o->scope->add_reference_s (new XTM::topicRef (href => $XTM::PSI::xtm{universal_scope}) ) unless $o->scope->references; $o->add_instanceOf (new XTM::instanceOf ( reference => new XTM::topicRef (href => - $item{type} && $item{type}->[0] ? "#$item{type}->[0]" : $XTM::PSI::xtm{occurrence} + $item{'type(?)'} && $item{'type(?)'}->[0] ? "#$item{'type(?)'}->[0]" : $XTM::PSI::xtm{occurrence} ))); $return = $o; } --- lib/XTM/LTM/Parser.pm 2005/03/03 10:29:22 1.1 +++ lib/XTM/LTM/Parser.pm 2005/03/03 10:29:33 @@ -12,7 +12,7 @@ $VERSION = '0.03'; use Data::Dumper; -use Parse::RecDescent; +use Parse::RecDescent 1.90; use URI; use XTM; @@ -37,11 +37,11 @@ my $tm = $arg{tm}; my @mentioned; - foreach my $d (@{$item{directive}}) { # walk over directives, some contain components - push @{$item{component}}, $d->{components} if ($d->{components}); + foreach my $d (@{$item{'directive(s?)'}}) { # walk over directives, some contain components + push @{$item{'component(s)'}}, $d->{components} if ($d->{components}); } - foreach my $cs (@{$item{component}}) { + foreach my $cs (@{$item{'component(s)'}}) { foreach my $c (@{$cs}) { if (ref($c) eq 'XTM::topic') { $tm->add ($c); @@ -92,13 +92,13 @@ mergemap_directive : '#MERGEMAP' tau_expr tm_format(?) { -# warn "MERGEMAP: $item{tau_expr}, format $item{tm_format}"; +# warn "MERGEMAP: $item{tau_expr}, format $item{'tm_format(?)'}"; # use Data::Dumper; -# print Dumper $item{tm_format}; +# print Dumper $item{'tm_format(?)'}; my $tm2; # will hold the new map - if (scalar @{$item{tm_format}}) { - my $format = $item{tm_format}->[0]; + if (scalar @{$item{'tm_format(?)'}}) { + my $format = $item{'tm_format(?)'}->[0]; use URI; my $uri = new URI ($item{tau_expr}); $uri->scheme ('file') unless $uri->scheme; # default is 'file:' @@ -134,16 +134,16 @@ my $t = new XTM::topic (id => $item{name}); - if (ref($item{types})) { - foreach (@{$item{types}->[0]}) { + if (ref($item{'types(?)'})) { + foreach (@{$item{'types(?)'}->[0]}) { $t->add__s (new XTM::instanceOf ( reference => new XTM::topicRef (href => "#$_"))); } }; $t->add__s (new XTM::instanceOf ( reference => new XTM::topicRef (href => $XTM::PSI::xtm{topic}))) unless $t->instanceOfs && @{$t->instanceOfs}; - if (ref($item{topname})) { - foreach my $bn (@{$item{topname}}) { + if (ref($item{'topname(s?)'})) { + foreach my $bn (@{$item{'topname(s?)'}}) { my $b = new XTM::baseName (); $b->add_baseNameString (new XTM::baseNameString (string => $bn->{basename})); $b->add_scope (new XTM::scope()); @@ -160,15 +160,15 @@ #use Data::Dumper; - #print Dumper $item{subject}; + #print Dumper $item{'subject(?)'}; my $s = new XTM::subjectIdentity (); # maybe we need it - if (ref ($item{subject}) && @{$item{subject}}) { - $s->add_ ( $item{subject}->[0]); + if (ref ($item{'subject(?)'}) && @{$item{'subject(?)'}}) { + $s->add_ ( $item{'subject(?)'}->[0]); } - if (ref($item{indicator})) { - foreach my $sin (@{$item{indicator}}) { + if (ref($item{'indicator(s?)'})) { + foreach my $sin (@{$item{'indicator(s?)'}}) { $s->add_reference_s ($sin); } } @@ -217,10 +217,10 @@ my $a = new XTM::association; # use Data::Dumper; -# warn "scope : ".Dumper $item{scope}; +# warn "scope : ".Dumper $item{'scope(?)'}; my $s = new XTM::scope; - foreach my $scope (@{$item{scope}} ? @{$item{scope}->[0]}: ()) { + foreach my $scope (@{$item{'scope(?)'}} ? @{$item{'scope(?)'}->[0]}: ()) { $s->add_reference_s (new XTM::topicRef (href => "#$scope")); } unless ($s->references) { @@ -251,9 +251,9 @@ { my $m = new XTM::member (); -#warn "type is ". Dumper $item{type}; - if (scalar @{$item{type}}) { - my $t = new XTM::topicRef (href => "#$item{type}->[0]"); +#warn "type is ". Dumper $item{'type(?)'}; + if (scalar @{$item{'type(?)'}}) { + my $t = new XTM::topicRef (href => "#" . $item{'type(?)'}->[0]); my $r = new XTM::roleSpec (); $r->add_reference ($t); $m->add_roleSpec ($r); @@ -280,7 +280,7 @@ my $o = new XTM::occurrence (); $o->add_resource ($item{resource}); $o->add_scope (new XTM::scope()); - foreach (@{$item{scope}->[0]}) { + foreach (@{$item{'scope(?)'}->[0]}) { $o->scope->add_reference_s (new XTM::topicRef (href => "#$_")); } $o->scope->add_reference_s (new XTM::topicRef (href => $XTM::PSI::xtm{universal_scope}) ) --- lib/XTM/Path.pm 2005/03/03 10:29:10 1.1 +++ lib/XTM/Path.pm 2005/03/03 10:29:33 @@ -362,7 +362,7 @@ path : step(s) { - $return = $item{step}; + $return = $item{'step(s)'}; } step : axis(?) relative_path @@ -377,7 +377,7 @@ { $return = { %{$item{'XTM_thing'}}, predicates => []}; - foreach (@{$item{predicate}}) { + foreach (@{$item{'predicate(s?)'}}) { push @{$return->{predicates}}, $_; } @@ -423,8 +423,8 @@ simple_expr : path op_value(?) { $return = [ $item{path} ]; - if ($item{op_value} && $item{op_value}->[0]) { - push @$return, @{$item{op_value}->[0]}; + if ($item{'op_value(?)'} && $item{'op_value(?)'}->[0]) { + push @$return, @{$item{'op_value(?)'}->[0]}; } } @@ -469,7 +469,7 @@ $parser = XTM::Path::CParser->new(); }; if ($@) { warn "could not find CParser ($@)"; - use Parse::RecDescent; + use Parse::RecDescent 1.90; $parser = new Parse::RecDescent ($xtmpath_grammar) or die "XTM::Path: Problem in grammar"; }; return $parser; @@ -871,8 +871,27 @@ # warn "found paths: ".join (",", Dumper $paths); if (scalar @$paths == 1) { # not immediate parent, but there is only one path # warn "found EXACTLY ONE path: ".join (",", @$paths); + my $parent; + if (scalar @$parents == 1) { + $parent = $parents->[0]; + } else { + # several possible parents + # pick the one that's mentioned in the path + my @candidates; + for my $p (@$parents) { + push @candidates, $p if grep { $_ eq $p } @{$paths->[0]}; + } + if (scalar @candidates == 1) { + $parent = $candidates[0]; + } else { + # several or no parents mentioned in the path + # this shouldn't happen; should we raise an exception? + # fall back to the "old behaviour" for now + $parent = $parents->[0]; + } + } return __create ($c, $v, - { element => $parents->[0], # call recursively with the parent in front of the path + { element => $parent, # call recursively with the parent in front of the path predicates => [] }, $s, @p); --- lib/XTM/XML/MemoryBuilder.pm 2005/03/03 10:29:03 1.1 +++ lib/XTM/XML/MemoryBuilder.pm 2005/03/03 10:29:31 @@ -721,6 +721,14 @@ " [Ln: " . $self->{LineNumber} . ", Col: " . $self->{ColumnNumber} . "]"); } +sub error { + my $self = shift; + my $message = shift; + + throw XML::SAX::Exception (Message => $message. + " [Ln: " . $self->{LineNumber} . ", Col: " . $self->{ColumnNumber} . "]"); +} + =pod =head1 AUTHOR INFORMATION --- t/04astma.t 2005/03/03 10:28:51 1.1 +++ t/04astma.t 2005/03/03 10:29:31 @@ -39,7 +39,7 @@ in: Ich chan Glaas ässe, das tuet mir nöd weeh ")); -is (@{$tm->topics('occurrence regexps /\x{C3A4}sse/')}, 1, 'single encoding'); +is (@{$tm->topics('occurrence regexps /\x{C3}\x{A4}sse/')}, 1, 'single encoding'); $tm = new XTM (tie => new XTM::AsTMa (auto_complete => 0, text => " %encoding iso8859-1 @@ -53,9 +53,9 @@ in: Mohu jíst sklo, neublí?í mi ")); -is (@{$tm->topics('occurrence regexps /\x{C3A4}sse/')}, 1, 'double encoding1'); +is (@{$tm->topics('occurrence regexps /\x{C3}\x{A4}sse/')}, 1, 'double encoding1'); is (@{$tm->topics('occurrence regexps /Mohu/')}, 1, 'double encoding2'); -is (@{$tm->topics('occurrence regexps /\x{C3AD}st/')}, 1, 'double encoding3'); +is (@{$tm->topics('occurrence regexps /\x{C3}\x{AD}st/')}, 1, 'double encoding3'); $tm = new XTM (tie => new XTM::AsTMa (auto_complete => 0, text => " --- t/09ltm.t 2005/03/03 10:28:55 1.1 +++ t/09ltm.t 2005/03/03 10:38:12 @@ -1,6 +1,5 @@ # -*-perl-*- use strict; -use utf8; use warnings 'all'; use Test::More tests => 40; use XTM; @@ -50,7 +49,7 @@ })); -like ($tm->topic ('ltm')->occurrences->[0]->resource->data, qr/\x{E4}sse/, 'encoding from iso8859-1'); +like ($tm->topic ('ltm')->occurrences->[0]->resource->data, qr/\x{C3}\x{A4}sse/, 'encoding from iso8859-1'); $tm = new XTM (tie => new XTM::LTM ( text => q{ { ltm , test , "http://rumsti/" }