Liam, If you were to stristr()/remove everything up to and including the </head> tag, would that take care of things? =dn
> I've got a lil problem with HTML tags. Here's the description. > > My site accepts HTML files by upload. A lot of these files are written in MS > Word and then saved as HTML files from that. MS Word likes to put a bunch of > garbage at the beginning of the file. Now, when users upload their HTML > files, my script goes and striptags all of the unnecessary junk in there > except it can't rid all this junk (HTML, XML, CSS, JavaScript) at the > beginning of the HTML file. Some of these tags span multiple lines, and my > script goes through line-by-line, so it won't identify these as tags. Is > there a simpler fashion? I don't need the junk about style sheeting and > stuff, because I have a style sheet that will take care of styling the files > the way they should be. I don't want the extra tags, even though they're > invisible to users when they web-view, because these are e-mailable files > (for HTML mail, it's fine; for text mail, I need to strip it down and that's > the problem). > > ================================================= > Just in case, I've included the HTML code below: > > > <html xmlns:o="urn:schemas-microsoft-com:office:office" > xmlns:w="urn:schemas-microsoft-com:office:word" > xmlns="http://www.w3.org/TR/REC-html40"> > > <head> > <meta http-equiv=Content-Type content="text/html; charset=windows-1252"> > <meta name=ProgId content=Word.Document> > <meta name=Generator content="Microsoft Word 10"> > <meta name=Originator content="Microsoft Word 10"> > <link rel=File-List href="NW100_files/filelist.xml"> > <title>Test test test</title> > <!--[if gte mso 9]><xml> > <o:DocumentProperties> > <o:Author>Liam Gibbs</o:Author> > <o:LastAuthor>Liam Gibbs</o:LastAuthor> > <o:Revision>1</o:Revision> > <o:TotalTime>1</o:TotalTime> > <o:Created>2002-08-30T18:09:00Z</o:Created> > <o:LastSaved>2002-08-30T18:10:00Z</o:LastSaved> > <o:Pages>1</o:Pages> > <o:Words>13</o:Words> > <o:Characters>79</o:Characters> > <o:Company>SXIA</o:Company> > <o:Lines>1</o:Lines> > <o:Paragraphs>1</o:Paragraphs> > <o:CharactersWithSpaces>91</o:CharactersWithSpaces> > <o:Version>10.3501</o:Version> > </o:DocumentProperties> > </xml><![endif]--><!--[if gte mso 9]><xml> > <w:WordDocument> > <w:SpellingState>Clean</w:SpellingState> > <w:GrammarState>Clean</w:GrammarState> > <w:Compatibility> > <w:BreakWrappedTables/> > <w:SnapToGridInCell/> > <w:WrapTextWithPunct/> > <w:UseAsianBreakRules/> > </w:Compatibility> > <w:BrowserLevel>MicrosoftInternetExplorer4</w:BrowserLevel> > </w:WordDocument> > </xml><![endif]--> > <style> > <!-- > /* Style Definitions */ > p.MsoNormal, li.MsoNormal, div.MsoNormal > {mso-style-parent:""; > margin:0cm; > margin-bottom:.0001pt; > mso-pagination:widow-orphan; > font-size:12.0pt; > font-family:"Times New Roman"; > mso-fareast-font-family:"Times New Roman";} > span.SpellE > {mso-style-name:""; > mso-spl-e:yes;} > @page Section1 > {size:612.0pt 792.0pt; > margin:72.0pt 90.0pt 72.0pt 90.0pt; > mso-header-margin:35.4pt; > mso-footer-margin:35.4pt; > mso-paper-source:0;} > div.Section1 > {page:Section1;} > --> > </style> > <!--[if gte mso 10]> > <style> > /* Style Definitions */ > table.MsoNormalTable > {mso-style-name:"Table Normal"; > mso-tstyle-rowband-size:0; > mso-tstyle-colband-size:0; > mso-style-noshow:yes; > mso-style-parent:""; > mso-padding-alt:0cm 5.4pt 0cm 5.4pt; > mso-para-margin:0cm; > mso-para-margin-bottom:.0001pt; > mso-pagination:widow-orphan; > font-size:10.0pt; > font-family:"Times New Roman";} > </style> > <![endif]--> > </head> > > <body lang=EN-US style='tab-interval:36.0pt'> > > <div class=Section1> > > <p class=MsoNormal>Test <span class=SpellE>test</span> <span > class=SpellE>test</span></p> > > <p class=MsoNormal align=center style='text-align:center'><span > class=SpellE>Fdjfkasdjfkla</span></p> > > <p class=MsoNormal align=center style='text-align:center'><span > class=SpellE><b > style='mso-bidi-font-weight:normal'>Fdjkslafjdklaf</b></span></p> > > <p class=MsoNormal style='text-align:justify'><o:p> </o:p></p> > > <p class=MsoNormal style='text-align:justify'><span > class=SpellE>Fdasfdfasffasdfdaadfdfs</span></p> > > <p class=MsoNormal style='text-align:justify'><span > class=SpellE>Dfsdfs</span></p> > > <p class=MsoNormal style='text-align:justify'>Hi</p> > > <p class=MsoNormal style='text-align:justify'><o:p> </o:p></p> > > <p class=MsoNormal style='text-align:justify'><span > style='mso-tab-count:3'> </span><span > class=SpellE>Jfdklas</span></p> > > <p class=MsoNormal style='text-align:justify'><o:p> </o:p></p> > > </div> > > </body> > > </html> > > -- > PHP General Mailing List (http://www.php.net/) > To unsubscribe, visit: http://www.php.net/unsub.php > > -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php