Liam,
If you were to stristr()/remove everything up to and including the </head>
tag, would that take care of things?
=dn


> I've got a lil problem with HTML tags. Here's the description.
>
> My site accepts HTML files by upload. A lot of these files are written in
MS
> Word and then saved as HTML files from that. MS Word likes to put a bunch
of
> garbage at the beginning of the file. Now, when users upload their HTML
> files, my script goes and striptags all of the unnecessary junk in there
> except it can't rid all this junk (HTML, XML, CSS, JavaScript) at the
> beginning of the HTML file. Some of these tags span multiple lines, and my
> script goes through line-by-line, so it won't identify these as tags. Is
> there a simpler fashion? I don't need the junk about style sheeting and
> stuff, because I have a style sheet that will take care of styling the
files
> the way they should be. I don't want the extra tags, even though they're
> invisible to users when they web-view, because these are e-mailable files
> (for HTML mail, it's fine; for text mail, I need to strip it down and
that's
> the problem).
>
> =================================================
> Just in case, I've included the HTML code below:
>
>
> <html xmlns:o="urn:schemas-microsoft-com:office:office"
> xmlns:w="urn:schemas-microsoft-com:office:word"
> xmlns="http://www.w3.org/TR/REC-html40";>
>
> <head>
> <meta http-equiv=Content-Type content="text/html; charset=windows-1252">
> <meta name=ProgId content=Word.Document>
> <meta name=Generator content="Microsoft Word 10">
> <meta name=Originator content="Microsoft Word 10">
> <link rel=File-List href="NW100_files/filelist.xml">
> <title>Test test test</title>
> <!--[if gte mso 9]><xml>
>  <o:DocumentProperties>
>   <o:Author>Liam Gibbs</o:Author>
>   <o:LastAuthor>Liam Gibbs</o:LastAuthor>
>   <o:Revision>1</o:Revision>
>   <o:TotalTime>1</o:TotalTime>
>   <o:Created>2002-08-30T18:09:00Z</o:Created>
>   <o:LastSaved>2002-08-30T18:10:00Z</o:LastSaved>
>   <o:Pages>1</o:Pages>
>   <o:Words>13</o:Words>
>   <o:Characters>79</o:Characters>
>   <o:Company>SXIA</o:Company>
>   <o:Lines>1</o:Lines>
>   <o:Paragraphs>1</o:Paragraphs>
>   <o:CharactersWithSpaces>91</o:CharactersWithSpaces>
>   <o:Version>10.3501</o:Version>
>  </o:DocumentProperties>
> </xml><![endif]--><!--[if gte mso 9]><xml>
>  <w:WordDocument>
>   <w:SpellingState>Clean</w:SpellingState>
>   <w:GrammarState>Clean</w:GrammarState>
>   <w:Compatibility>
>    <w:BreakWrappedTables/>
>    <w:SnapToGridInCell/>
>    <w:WrapTextWithPunct/>
>    <w:UseAsianBreakRules/>
>   </w:Compatibility>
>   <w:BrowserLevel>MicrosoftInternetExplorer4</w:BrowserLevel>
>  </w:WordDocument>
> </xml><![endif]-->
> <style>
> <!--
>  /* Style Definitions */
>  p.MsoNormal, li.MsoNormal, div.MsoNormal
> {mso-style-parent:"";
> margin:0cm;
> margin-bottom:.0001pt;
> mso-pagination:widow-orphan;
> font-size:12.0pt;
> font-family:"Times New Roman";
> mso-fareast-font-family:"Times New Roman";}
> span.SpellE
> {mso-style-name:"";
> mso-spl-e:yes;}
> @page Section1
> {size:612.0pt 792.0pt;
> margin:72.0pt 90.0pt 72.0pt 90.0pt;
> mso-header-margin:35.4pt;
> mso-footer-margin:35.4pt;
> mso-paper-source:0;}
> div.Section1
> {page:Section1;}
> -->
> </style>
> <!--[if gte mso 10]>
> <style>
>  /* Style Definitions */
>  table.MsoNormalTable
> {mso-style-name:"Table Normal";
> mso-tstyle-rowband-size:0;
> mso-tstyle-colband-size:0;
> mso-style-noshow:yes;
> mso-style-parent:"";
> mso-padding-alt:0cm 5.4pt 0cm 5.4pt;
> mso-para-margin:0cm;
> mso-para-margin-bottom:.0001pt;
> mso-pagination:widow-orphan;
> font-size:10.0pt;
> font-family:"Times New Roman";}
> </style>
> <![endif]-->
> </head>
>
> <body lang=EN-US style='tab-interval:36.0pt'>
>
> <div class=Section1>
>
> <p class=MsoNormal>Test <span class=SpellE>test</span> <span
> class=SpellE>test</span></p>
>
> <p class=MsoNormal align=center style='text-align:center'><span
> class=SpellE>Fdjfkasdjfkla</span></p>
>
> <p class=MsoNormal align=center style='text-align:center'><span
> class=SpellE><b
> style='mso-bidi-font-weight:normal'>Fdjkslafjdklaf</b></span></p>
>
> <p class=MsoNormal style='text-align:justify'><o:p>&nbsp;</o:p></p>
>
> <p class=MsoNormal style='text-align:justify'><span
> class=SpellE>Fdasfdfasffasdfdaadfdfs</span></p>
>
> <p class=MsoNormal style='text-align:justify'><span
> class=SpellE>Dfsdfs</span></p>
>
> <p class=MsoNormal style='text-align:justify'>Hi</p>
>
> <p class=MsoNormal style='text-align:justify'><o:p>&nbsp;</o:p></p>
>
> <p class=MsoNormal style='text-align:justify'><span
> style='mso-tab-count:3'> </span><span
> class=SpellE>Jfdklas</span></p>
>
> <p class=MsoNormal style='text-align:justify'><o:p>&nbsp;</o:p></p>
>
> </div>
>
> </body>
>
> </html>
>
> --
> PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to