I'm reminded of a reasonable quote: "It's easy to write HTML, and impossible to parse it" Because HTML is *SO* easy to write, and has so many options, and is easy to screw up AND have browsers still work, it's totally evil to parse.
You end up with massive regular expressions, or a state engine which reads the string one character at a time, and can keep track of what's going on, then make changes. I've been working on a state engine for a while, with some success. Basically, a browser is just a big state engine. I think you should perhaps look at your problem from another angle... there are MANY valid ways to write HTML tags... whatever your system does, it should support user preferences. <A HREF=foo.php> = valid <A HREF="foo.php"> = valid <A HREF='foo.php'> = valid Plus combinations of single and double quotes for different attributes, etc etc. Check out http://www.w3.org/TR/1999/REC-html401-19991224/intro/sgmltut.html Justin French on 01/08/02 10:43 AM, electroteque ([EMAIL PROTECTED]) wrote: > i sorter need a preg example i'm not very good at it , and its for a wysiwyg > dhtml editor , it reformats those tags if the quotes are there when i load > the content and stuffs the code > > -----Original Message----- > From: Joel Boonstra [mailto:[EMAIL PROTECTED]] > Sent: Thursday, August 01, 2002 6:18 AM > To: [EMAIL PROTECTED] > Cc: electroteque > Subject: Re: stripping quotes from urls and images > > >> hi guys i now have a problem with urls i need to remove the quotes from > both >> href="" and src="" >> >> so >> <a href="blah"> <img src=""> needs to be <a href=> <img src=> and i cant >> remove quotes from all string matches :| > > Someone mentioned this already, but you should really know that if you > remove quotes from your HTML attributes, your HTML will no longer be > forward-compatible with XHTML, which you really should strive for. What > reason would you have for removing the quotes? There must be a better > workaround than removing them. > > Regarding your question, it really depends on how you can operate on > your HTML. Where is your HTML? Is it in a database, string, etc? Is > it all in one chunk? That is, do you have stuff like this: > > <a href="some/url"><img src="some/image">This link has "quotes"</img></a> > > that can't just have all quotes removed entirely from it? If you really > need to remove quotes from attributes (and I don't think you do), and > all of your HTML is in one big string, you're looking at a regular > expression. Read up on them here: > > http://www.php.net/manual/en/pcre.pattern.syntax.php > > Yep, that's a lot of reading. But you should learn about regular > expressions; they'll be useful all over the place. If you happen to be > on a Linux/Unix/BSD/whatever machine that has man pages and perl > installed, check out `man perlre`. Or read on-line here: > > http://www.perldoc.com/perl5.6.1/pod/perlre.html > > Some things to watch out for -- even if you do want to remove quotes > from your attributes, I'm 100% sure you don't want to remove from *all* > of them. Like this: > > <img src="/some/image/" title="this is a pop-up description" /> > > Removing the quotes from the title attribute will likely break at least > some browsers, if not all. So your regular expression needs to be able > to handle that gracefully. > > Good luck! > > -- > [ joel boonstra | [EMAIL PROTECTED] ] > -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php