> > I don't know about how XML::Parser handles memory - last time 
> > I tried to use it to parse content.rdf from http://dmoz.org , 
> > it soaked up all my memory, then bombed. Sometimes, you need 
> > to write your own parsing subs :)
> 
> Is the file you referred to a really big file?

dmoz is sorta like Yahoo's directory, only, er, bigger.
(Not sure if it's really bigger, but it sure is big.)

dmoz's content.rdf is an rdf dump of just about the
whole darn thing.

So, yeah, it's kinda big!

> When dealing with large XML files, an event based parser (eg:
> XML::Parser in native mode, the SAX modules) or a hybrid event/
> tree module like XML::Twig is best.

Yep. Sometimes, you need to write your own parsing subs :)

> But I can only echo what other people have said about parsing
> XML with regexes - just don't do it!

... don't do it!... don't do it!... don't do it!... don't do it!...

Man, the acoustics in this place...

-----------

If it ain't been mentioned before, Data Munging with Perl
is a good book for outlining approaches to this sort of
thing, especially choosing when to use one or other of
the various xml parsers.

Reply via email to