I used mostly the same things, but without uri_(un)escape it doesnt work. If i clearly understant, in order to parse feed i need apply escape methods to broken elements, am i right ?
i there any module with magic function like fixBrokenXML() ? Thanks On Fri, Jun 04, 2010 at 01:01:53PM -0400, Chas. Owens wrote: > On Fri, Jun 4, 2010 at 12:23, Roman Makurin <dro...@gmail.com> wrote: > > Hi, here it is http://pastebin.org/307289 > > > > On Fri, Jun 04, 2010 at 12:06:24PM -0400, Chas. Owens wrote: > >> On Fri, Jun 4, 2010 at 10:16, Roman Makurin <dro...@gmail.com> wrote: > >> > Hi all > >> > > >> > Last time i have a big problem, i need parse xml files > >> > which have invalid xml chars outside of CDATA and xml > >> > parser hangs everytime on such files. Is there any way > >> > to parse such files ??? > >> snip > >> > >> Can you give an example of these invalid characters? > >> > >> -- > >> Chas. Owens > >> wonkden.net > >> The most important skill a programmer can have is the ability to read. > > > > -- > > If you think of MS-DOS as mono, and Windows as stereo, > > then Linux is Dolby Digital and all the music is free... > > > > Given that this is RSS, you should be able to get away with using a > regex to fix the links. This works for me: > > #!/usr/bin/perl > > use strict; > use warnings; > > use XML::RSS::Parser; > use URI::Escape qw/uri_escape uri_unescape/; > > my $filename = shift; > > my $xml = do { > open my $fh, "<", $filename > or die "could not open $filename: $!"; > local $/; > <$fh>; > }; > > $xml =~ s{<link>(.*?)</link>}{"<link>" . uri_escape($1) . "</link>"}seg; > > > my $p = XML::RSS::Parser->new > or die "could not create parser\n"; > > my $feed = $p->parse_string($xml) > or die "could not parse $filename:", $p->errstr, "\n"; > > for my $item ( $feed->query('//item') ) { > my $title = $item->query('title')->text_content; > my $link = uri_unescape $item->query('link')->text_content; > printf "%60.60s: %s\n", $title, $link; > } > > > -- > Chas. Owens > wonkden.net > The most important skill a programmer can have is the ability to read. -- If you think of MS-DOS as mono, and Windows as stereo, then Linux is Dolby Digital and all the music is free...
pgpjcOSVt75Li.pgp
Description: PGP signature