On Fri, Jun 4, 2010 at 12:23, Roman Makurin <dro...@gmail.com> wrote: > Hi, here it is http://pastebin.org/307289 > > On Fri, Jun 04, 2010 at 12:06:24PM -0400, Chas. Owens wrote: >> On Fri, Jun 4, 2010 at 10:16, Roman Makurin <dro...@gmail.com> wrote: >> > Hi all >> > >> > Last time i have a big problem, i need parse xml files >> > which have invalid xml chars outside of CDATA and xml >> > parser hangs everytime on such files. Is there any way >> > to parse such files ??? >> snip >> >> Can you give an example of these invalid characters? >> >> -- >> Chas. Owens >> wonkden.net >> The most important skill a programmer can have is the ability to read. > > -- > If you think of MS-DOS as mono, and Windows as stereo, > then Linux is Dolby Digital and all the music is free... >
Given that this is RSS, you should be able to get away with using a regex to fix the links. This works for me: #!/usr/bin/perl use strict; use warnings; use XML::RSS::Parser; use URI::Escape qw/uri_escape uri_unescape/; my $filename = shift; my $xml = do { open my $fh, "<", $filename or die "could not open $filename: $!"; local $/; <$fh>; }; $xml =~ s{<link>(.*?)</link>}{"<link>" . uri_escape($1) . "</link>"}seg; my $p = XML::RSS::Parser->new or die "could not create parser\n"; my $feed = $p->parse_string($xml) or die "could not parse $filename:", $p->errstr, "\n"; for my $item ( $feed->query('//item') ) { my $title = $item->query('title')->text_content; my $link = uri_unescape $item->query('link')->text_content; printf "%60.60s: %s\n", $title, $link; } -- Chas. Owens wonkden.net The most important skill a programmer can have is the ability to read. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/