Parsing file and regexp
Hello, I need to extract info from some text files. And I want to do it with Perl ! The file I need to parse has the following layout: keywordA word1, word2, word3; Here we can have some free text ... ... keywordB word4, word5, word6, word7, word8, word9, word10; KeywordA word1, word2; ... I want to extract all the "keywords" with their associated words. For example, with this file, I would like to have: keywordA: (word1, word2, word3) keywordB: (word4, word5, word6, word7, word8, word9, word10) keywordA: (word1, word2) Is it possible to do this with regular expression ? Or should I write a small parser ? I have tried pattern matching with the 's' and also with the 'm' option, but with no good result ... Thanks to help me ! Olivier -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Parsing file and regexp
(Sorry but I have problem with my ISP, so I repost !) Uri Guttman wrote: > > how do you know when a keyword section begins or ends? how large is this > > file? could free text have keywords? i see a ; to end a word list but > > that isn't enough to properly parse this if you have 'free text'. > > > > osc> Is it possible to do this with regular expression ? > > osc> Or should I write a small parser ? > > > > yes and yes. > > > > osc> I have tried pattern matching with the 's' and also with the 'm' > > osc> option, > > osc> but with no good result ... > > > > please show your code. there is no way to help otherwise. s/// is not a > > pattern matcher but a substitution operator. it uses regexes and can be > > used to parse things. > > > > uri > > Hi Uri, Sorry, code is at my office The free text can not contain keywords. And keywords start at the beginning of a line. The list of words is terminated by a ";". For the pattern matching I have used the option s: m/pattern/s, to swallow the different \n. Olivier -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Parsing file and regexp
Uri Guttman wrote: > > please show your code. there is no way to help otherwise. s/// is not a > > pattern matcher but a substitution operator. it uses regexes and can be > > used to parse things. > > > > uri > > Here it is ... $ cat test.txt keyword1 word1, word2 word3; blabla blabla keyword2 word4, word5, word6, word7, word8, word9; bla bla bla bla keyword1 word10, word11; $ cat parse.pl use warnings; open FILE, "< test.txt" or die "Could not open $!"; $/ = undef; $source = ; close(FILE); if ($source =~ m/keyword1\s*(\w*)(,\w*)*/s) { print("Match !\n"); print("$1\n"); print("$2\n"); } $ perl parse.pl Match ! word1 , Here I would like to have 2 matches: word1, word2 word3; and word10, word11; Thanks to help me ! Olivier -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/