On 1/8/06, Jared Williams <[EMAIL PROTECTED]> wrote:
>
> > I have an HTML document at home (the Firefox bookmarks
> > output) that I was trying to parse this morning, with many
> > links as such:
> > <a href="http://www.aximsite.com/articles/link.php?id=22";
> > add_date="1130275531" last_charset="windows-1252"
> > id="rdf:#$FOQot">Knowledge Base: What Can I Do With My Axim?</a>
> >
> > I want to parse it to remove the attributes add_date,
> > last_charset, id, and others that are in other entries. The
> > text of the file it produces is 22 kb but the HTML is over 500 kb!
> >
> > I don't have the code with me that I was trying (I'm not at
> > home now- but it has been nagging me all day), but I was
> > running into problems and could NOT get it to just remove the
> > attributes. One regex solution left me with <a></a> and
> > others with <>, <a href="http://address";
> > ="something" ="somehing else">blahblah</a>, etc...
> >
> > How can I get it to remove, say, attribute xxx and the ="something"
> > that follows it? In all fairness I am not good at regexes and
> > need to practice, but I would appreciate any help I can get-
> > I'm really bogged down with studies and simply cannot devote
> > a full day to this 'trivial' excercise.
>
> I'd forget regexps and use SAX style parser. Trivial then to remove unwanted 
> attributes.
>
> Jared
>
>

Thanks. Looking into that now...

Dotan Cohen
http://technology-sleuth.com/short_answer/what_is_hdtv.html
*@

Reply via email to