On Wed, 18 Apr 2012 22:23:37 +0200 Manfred Lotz <manfred.l...@arcor.de> wrote:
> On Thu, 19 Apr 2012 06:15:47 +1000 > "Owen" <rc...@pcug.org.au> wrote: > > > > > > Hi there, > > > I've got a question about XML::Mini. > > > > > > When parsing an xml document for some reasons I want to preserve > > > white space. However, it doesn't work really. > > > > > > Minimal example: > > > > > > ! /usr/bin/perl > > > > > > > > > use strict; > > > use warnings; > > > use Data::Dumper; > > > use XML::Mini::Document; > > > > > > my $XMLString = "<book> Learning Perl </book>"; > > > > > > my $xmlDoc = XML::Mini::Document->new(); > > > > > > $XML::Mini::IgnoreWhitespaces = 0; > > > > > > # init the doc from an XML string > > > $xmlDoc->parse($XMLString); > > > > > > my $xmlHash = $xmlDoc->toHash(); > > > > > > print Dumper($xmlHash); > > > > > > > > > I get the following output: > > > VAR1 = { > > > 'book' => 'Learning Perl ' > > > }; > > > > > > > > > I would have expecte to have > > > book' => ' Learning Perl ' > > > > > > instead. > > > > > > > > > Any idea, what's going wrong? > > > > > > What Happens if you set $XML::Mini::IgnoreWhitespaces = 1 > > > > Seems to me that 1 = yes > > > > This is true. > > > What does the documentation say? > > > > If I set it to 1 then I get > book' => 'Learning Perl' > > which is even worse. Please note that I don't want to have ignored > white space. > > Hm, I had no other idea but to look up the source code. I guess I found what happens. if ($XMLString =~ m/^\s*(<\s*([^\s>]+)([^>]+)\/\s*>| # <unary \/> <\?\s*([^\s>]+)\s*([^>]*)\?>| # <? headers ?> <!--(.+?)-->| # <!-- comments --> <!\[CDATA\s*\[(.*?)\]\]\s*>\s*| # CDATA <!DOCTYPE\s*([^\[>]*)(\[.*?\])?\s*>\s*| # DOCTYPE <!ENTITY\s*([^"'>]+)\s*(["'])([^\11]+)\11\s*>\s*| # ENTITY ([^<]+))(.*)/xogsmi) # plain text IHMO, here is the bug. Here leading white space will be deleted which is ok if it is no plaintext. I changed it like this if ($XMLString =~ m/(^\s*<\s*([^\s>]+)([^>]+)\/\s*>| #<unary \/> ^\s*<\?\s*([^\s>]+)\s*([^>]*)\?>| # <? headers ?> ^\s*<!--(.+?)-->| # <!-- comments --> ^\s*<!\[CDATA\s*\[(.*?)\]\]\s*>\s*| # CDATA ^\s*<!DOCTYPE\s*([^\[>]*)(\[.*?\])?\s*>\s*| # DOCTYPE ^\s*<!ENTITY\s*([^"'>]+)\s*(["'])([^\11]+)\11\s*>\s*| # ENTITY ([^<]+))(.*)/xogsmi) # plain text Now in all cases except plain text leading space will be deleted. $VAR1 = { 'book' => ' Learning Perl ' }; -- Manfred -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/