I have an HTML document at home (the Firefox bookmarks output) that I
was trying to parse this morning, with many links as such:
<a href="http://www.aximsite.com/articles/link.php?id=22";
add_date="1130275531" last_charset="windows-1252"
id="rdf:#$FOQot">Knowledge Base: What Can I Do With My Axim?</a>

I want to parse it to remove the attributes add_date, last_charset,
id, and others that are in other entries. The text of the file it
produces is 22 kb but the HTML is over 500 kb!

I don't have the code with me that I was trying (I'm not at home now-
but it has been nagging me all day), but I was running into problems
and could NOT get it to just remove the attributes. One regex solution
left me with <a></a> and others with <>, <a href="http://address";
="something" ="somehing else">blahblah</a>, etc...

How can I get it to remove, say, attribute xxx and the ="something"
that follows it? In all fairness I am not good at regexes and need to
practice, but I would appreciate any help I can get- I'm really bogged
down with studies and simply cannot devote a full day to this
'trivial' excercise.

Thanks, all!

Dotan Cohen
http://technology-sleuth.com/long_answer/why_are_internet_greeting_cards_dangerous.html

Reply via email to