Hi, ~ Sometimes one needs to just read the basic content of some text without tags and other included text segments ~ I am trying to cleanse slashdot comments but some patterns I am not getting right. Here are the patterned segments of texts I would like to match: ~ // __ (1) lines looking like; (starting with "Re:" + number of characters (or words?)) + three dots + space + "(Score:" + number for 1 to 5 + ", " (comma space) + some word ~ Re:An the solution is.... (Score:3, Interesting) Re:An the solution is.... (Score:5, Informative) Re:Its a pity that... (Score:4, Informative) Re:Quick Fix (Score:4, Informative) Re:Quick Fix (Score:4, Insightful) ~ I am matching with the pattern: ~ ^Re:[^()]+\s ~ All this pattern basically says (to my understanding) is "match everything that starts with 'Re:' and has some characters righ after". However I am not able to include the suffix looking like "\.\.\. \ (Score:[1-5], \w\)$" to that pattern ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ // __ (2) lines looking like; (starting with "by " + number of characters (or words?)) + "(" + number + ")" + words + " on " + formatted date + formatted time + "(#" + numbers + ")" + (optionally 0 or n words separated by spaces) ~ by Zebra_X (13249) Alter Relationship on Friday July 25, @08:23AM (#24332653) by techno-vampire (666512) Alter Relationship on Friday July 25, @04:49PM (#24341237) Homepage by javilon (99157) Alter Relationship on Friday July 25, @10:05AM (#24334257) Homepage by quantum bit (225091) Alter Relationship on Friday July 25, @10:32AM (#24334757) Journal by Pharmboy (216950) Alter Relationship on Friday July 25, @12:55PM (#24337273) Homepage Journal ~ I am matching with the pattern: ~ ^by [^()]+\s ~ similar to my explanation above, but I am missing much more that could be matched ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ // __ (3) lines looking like; free text + partial pattern " \(Score: [1-5], " + word + partial pattern "\)" ~ I disagree with most of these posts (Score:3, Insightful) Been there, done that (Score:5, Insightful) Update (Score:3, Informative) That does not make any sense. (Score:3, Interesting) Lack of professionalism, IMO (Score:4, Funny) ~ I am matching with: ~ [^()]+ \(Score:[1-5]\)? ~ but I am missing the last part of it ~ Also do you know of tables out there with the common date formats used on the Internet and their corresponding regex patterns? ~ Is it possible to specify formatting in dates and timestamps or do you use ranges for that? I am talking here about dates formatted like: ~ Friday July 25, @08:23AM Friday July 25, @12:55PM ~ Thanks lbrtchx
-- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/