Re: Stripping HTML from a text file.

2003-09-05 Thread Randal L. Schwartz
> "Sara" == Sara <[EMAIL PROTECTED]> writes: Sara> I have a couple of text files with html code in them.. e.g. Sara> -- Text File -- Sara> Sara> Sara> This is Test File Sara> Sara> Sara> This is the test file contents Sara> Sara> blah blah blah

Re: Stripping HTML from a text file.

2003-09-04 Thread drieux
On Thursday, Sep 4, 2003, at 17:55 US/Pacific, Hanson, Rob wrote: $text =~ s|().*?.*?.*?()|$1$2$3|s; actually that should be: $text =~ s|().*?(.*?).*?()|$1$2$3|s; way stylish! I actually like. But assumes that there will be a title element - otherwise it will fail and not clear out the other s

Re: Stripping HTML from a text file.

2003-09-04 Thread Wiggins d'Anconia
drieux wrote: It could just be my OCD, but if I could have hammered flat every FROOOTLOOP who wanted merely a 'quick and dirty' one time only fix, 'honest, it's just this one time', rather than actually cure the root cause problem, WE would be on a flat earth from all the pounding That or we

Re: Stripping HTML from a text file.

2003-09-04 Thread Sara
x27;Anconia'" <[EMAIL PROTECTED]>; "'Sara'" <[EMAIL PROTECTED]> Cc: "beginperl" <[EMAIL PROTECTED]> Sent: Friday, September 05, 2003 5:55 AM Subject: RE: Stripping HTML from a text file. : > Or maybe I misunderstood the question : : Or maybe I d

Re: Stripping HTML from a text file.

2003-09-04 Thread drieux
On Thursday, Sep 4, 2003, at 17:55 US/Pacific, Hanson, Rob wrote: [..] I agree... but only if you are looking for a strong permanant solution. The regex way is good for quick and dirty HTML work. [..] technically we agree right up to the 'quick and dirty' part... I mean, how many times have we wa

RE: Stripping HTML from a text file.

2003-09-04 Thread Hanson, Rob
04, 2003 8:48 PM To: 'Sara' Cc: beginperl Subject: Re: Stripping HTML from a text file. Won't this remove *everything* between the given tags? Or maybe I misunderstood the question, I thought she wanted to remove the "code" from all of the contents between two tags? Becaus

Re: Stripping HTML from a text file.

2003-09-04 Thread drieux
On Wednesday, Sep 3, 2003, at 03:32 US/Pacific, Sara wrote: [..] What I want to do is to remove/delete HTML code from the text file from a certain tag upto certain tag. For example; I want to delete the code completely that comes in between and (including any style tags and embedded javascrip

Re: Stripping HTML from a text file.

2003-09-04 Thread Wiggins d'Anconia
Won't this remove *everything* between the given tags? Or maybe I misunderstood the question, I thought she wanted to remove the "code" from all of the contents between two tags? Because of the complexity and variety of HTML code, the number of different tags, etc. I would suggest using an HTML

RE: Stripping HTML from a text file.

2003-09-04 Thread Hanson, Rob
A simple regex will do the trick... # untested $text = "..."; $text =~ s|.*?||s; Or something more generic... # untested $tag = "head"; $text =~ s|<$tag[^>]*?>.*?||s; This second one also allows for possible attributes in the start tag. You may need more than this if the HTML isn't well formed