okay okay okay
Let me smile a bit first. I'm a newbie, a kid.
[?][?][?]
I'm still new to the intricacies of regular expressions.
Maybe some day i will succeed in crossing 3000kms on foot in 24hours.
hahahaha.
Thank you all.
*I finished the job using HTML::Strip much earlier*
<<338.gif>><<
I hate it when I post something and then find a bit of information I should
have included.
http://stackoverflow.com/questions/701166/can-you-provide-some-examples-of-why-it-is-hard-to-parse-xml-and-html-with-a-reg
The poster lists four valid HTML constructs that regex are ill equiped to
handle.
On Sat, Apr 14, 2012 at 07:05:54PM +0300, Shlomi Fish wrote:
> Hi Somu,
>
> On Sat, 14 Apr 2012 21:01:03 +0530
> Somu wrote:
>
> > OK. Can i ask "WHY?"
> > Why can't it be done using regex. Isn't a html file just another long
> > string with more, but similar special characters??
> >
>
> first
Hi Somu,
On Sat, 14 Apr 2012 21:01:03 +0530
Somu wrote:
> OK. Can i ask "WHY?"
> Why can't it be done using regex. Isn't a html file just another long
> string with more, but similar special characters??
>
first of all I should note that you appear to be replying to the wrong messages
which br
On 04/14/2012 11:42 AM, Zheng Du wrote:
Hi Somu,
Of course if can be done by using regex, but if there is a single line
command can do the job, that's absolutely more efficient, and less bug.
actually it can't be done by a regex. consider the issue of comments.
think about comments containing
Hi Somu,
Of course if can be done by using regex, but if there is a single line
command can do the job, that's absolutely more efficient, and less bug.
Unless you're eager to polish your Perl skill. =D
Du Zheng
2012/4/14 Somu
> OK. Can i ask "WHY?"
> Why can't it be done using regex. Isn't a
OK. Can i ask "WHY?"
Why can't it be done using regex. Isn't a html file just another long
string with more, but similar special characters??
Somu
Hi Somu,
On Sat, 14 Apr 2012 14:46:50 +0530
Somu wrote:
> Sir, what is this??:
>
> lynx -stdin -dump < in.html > out.txt
>
It's a UNIX command. What it does is take the file "in.html" (without the
quotes), pipe it through "lynx -stdin -dump" and put its output in the
"out.txt" fi
Hi Somu,
On Sat, 14 Apr 2012 12:56:03 +0530
Somu wrote:
> *Hi all,
> I was trying to strip off all html tags and the special characters from a
> html file using regex.
> my code is as follows..
please don't use regular expressions to parse and process HTML:
*
http://perl-begin.org/FAQs/freeno
Sir, what is this??:
lynx -stdin -dump < in.html > out.txt
For now, the job got done by HTML::Strip
@Zheng Du, will try your suggestion, but the other files maybe big for one
variable?(these are files containing words and meaning)
Somu.
Hi Som,
Looks like you want to do the minimal match, so you can refer to the code:
$line =~ s/(<.*>)?//;
=>
$line =~ s/<.*?>//g;
But there is still a problem,you have '<' and '>' placing in different
lines, so you can try to read all the file content into a variable, and
replace them once for al
On 2012-04-14 09:26, Somu wrote:
I was trying to strip off all html tags and the special characters from a
html file using regex.
Alternative:
lynx -stdin -dump < in.html > out.txt
--
Ruud
--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginne
*Hi all,
I was trying to strip off all html tags and the special characters from a
html file using regex.
my code is as follows..
*
use strict;
use warnings;
sub strip_html{
my $line = shift;
#something wrong in the following
13 matches
Mail list logo