Hi,
~
 Sometimes one needs to just read the basic content of some text
without tags and other included text segments
~
 I am trying to cleanse slashdot comments but some patterns I am not
getting right. Here are the patterned segments of texts I would like
to match:
~
// __ (1) lines looking like; (starting with "Re:" + number of
characters (or words?)) + three dots + space + "(Score:" + number for
1 to 5 + ", " (comma space) + some word
~
Re:An the solution is.... (Score:3, Interesting)
Re:An the solution is.... (Score:5, Informative)
Re:Its a pity that... (Score:4, Informative)
Re:Quick Fix (Score:4, Informative)
Re:Quick Fix (Score:4, Insightful)
~
 I am matching with the pattern:
~
^Re:[^()]+\s
~
 All this pattern basically says (to my understanding) is "match
everything that starts with 'Re:' and has some characters righ after".
However I am not able to include the suffix looking like "\.\.\. \
(Score:[1-5], \w\)$" to that pattern
~
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
// __ (2) lines looking like; (starting with "by " + number of
characters (or words?)) + "(" + number + ")" + words + " on " +
formatted date + formatted time + "(#" + numbers + ")" + (optionally 0
or n words separated by spaces)
~
by Zebra_X (13249) Alter Relationship on Friday July 25, @08:23AM
(#24332653)
by techno-vampire (666512) Alter Relationship on Friday July 25,
@04:49PM (#24341237) Homepage
by javilon (99157) Alter Relationship on Friday July 25, @10:05AM
(#24334257) Homepage
by quantum bit (225091) Alter Relationship on Friday July 25, @10:32AM
(#24334757) Journal
by Pharmboy (216950) Alter Relationship on Friday July 25, @12:55PM
(#24337273) Homepage Journal
~
 I am matching with the pattern:
~
^by [^()]+\s
~
 similar to my explanation above, but I am missing much more that
could be matched
~
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
// __ (3) lines looking like; free text + partial pattern " \(Score:
[1-5], " + word + partial pattern "\)"
~
I disagree with most of these posts (Score:3, Insightful)
Been there, done that (Score:5, Insightful)
Update (Score:3, Informative)
That does not make any sense. (Score:3, Interesting)
Lack of professionalism, IMO (Score:4, Funny)
~
 I am matching with:
~
[^()]+ \(Score:[1-5]\)?
~
 but I am missing the last part of it
~
 Also do you know of tables out there with the common date formats
used on the Internet and their corresponding regex patterns?
~
 Is it possible to specify formatting in dates and timestamps or do
you use ranges for that? I am talking here about dates formatted like:
~
 Friday July 25, @08:23AM
 Friday July 25, @12:55PM
~
 Thanks
 lbrtchx


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/


Reply via email to