I used mostly the same things, but without uri_(un)escape it doesnt work.
If i clearly understant, in order to parse feed i need apply escape
methods to broken elements, am i right ?

i there any module with magic function like fixBrokenXML() ?

Thanks

On Fri, Jun 04, 2010 at 01:01:53PM -0400, Chas. Owens wrote:
> On Fri, Jun 4, 2010 at 12:23, Roman Makurin <dro...@gmail.com> wrote:
> > Hi, here it is http://pastebin.org/307289
> >
> > On Fri, Jun 04, 2010 at 12:06:24PM -0400, Chas. Owens wrote:
> >> On Fri, Jun 4, 2010 at 10:16, Roman Makurin <dro...@gmail.com> wrote:
> >> > Hi all
> >> >
> >> > Last time i have a big problem, i need parse xml files
> >> > which have invalid xml chars outside of CDATA and xml
> >> > parser hangs everytime on such files. Is there any way
> >> > to parse such files ???
> >> snip
> >>
> >> Can you give an example of these invalid characters?
> >>
> >> --
> >> Chas. Owens
> >> wonkden.net
> >> The most important skill a programmer can have is the ability to read.
> >
> > --
> > If you think of MS-DOS as mono, and Windows as stereo,
> >  then Linux is Dolby Digital and all the music is free...
> >
> 
> Given that this is RSS, you should be able to get away with using a
> regex to fix the links.  This works for me:
> 
> #!/usr/bin/perl
> 
> use strict;
> use warnings;
> 
> use XML::RSS::Parser;
> use URI::Escape qw/uri_escape uri_unescape/;
> 
> my $filename = shift;
> 
> my $xml = do {
>       open my $fh, "<", $filename
>               or die "could not open $filename: $!";
>       local $/;
>       <$fh>;
> };
> 
> $xml =~ s{<link>(.*?)</link>}{"<link>" . uri_escape($1)  . "</link>"}seg;
> 
> 
> my $p = XML::RSS::Parser->new
>       or die "could not create parser\n";
> 
> my $feed = $p->parse_string($xml)
>       or die "could not parse $filename:", $p->errstr, "\n";
> 
> for my $item ( $feed->query('//item') ) {
>       my $title = $item->query('title')->text_content;
>       my $link  = uri_unescape $item->query('link')->text_content;
>       printf "%60.60s: %s\n", $title, $link;
> }
> 
> 
> -- 
> Chas. Owens
> wonkden.net
> The most important skill a programmer can have is the ability to read.

-- 
If you think of MS-DOS as mono, and Windows as stereo,
 then Linux is Dolby Digital and all the music is free...

Attachment: pgpjcOSVt75Li.pgp
Description: PGP signature

Reply via email to