Parsing file and regexp

2010-02-13 Thread olivier.scalb...@algosyn.com
Hello,

I need to extract info from some text files. And I want to do it with
Perl !

The file I need to parse has the following layout:

keywordA word1, word2, word3;

Here we can have some free text
...
...

keywordB word4,
  word5, word6, word7, word8,
  word9, word10;

KeywordA
  word1, word2;

...

I want to extract all the "keywords" with their associated words.
For example, with this file, I would like to have:
keywordA: (word1, word2, word3)
keywordB: (word4, word5, word6, word7, word8, word9, word10)
keywordA: (word1, word2)

Is it possible to do this with regular expression ?
Or should I write a small parser ?

I have tried pattern matching with the 's' and also with the 'm'
option,
but with no good result ...

Thanks to help me !

Olivier


-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Parsing file and regexp

2010-02-19 Thread olivier.scalb...@algosyn.com
(Sorry but I have problem with my ISP, so I repost !)

Uri Guttman wrote:

> > how do you know when a keyword section begins or ends? how large is this
> > file? could free text have keywords? i see a ; to end a word list but
> > that isn't enough to properly parse this if you have 'free text'.
> >
> >   osc> Is it possible to do this with regular expression ?
> >   osc> Or should I write a small parser ?
> >
> > yes and yes.
> >
> >   osc> I have tried pattern matching with the 's' and also with the 'm'
> >   osc> option,
> >   osc> but with no good result ...
> >
> > please show your code. there is no way to help otherwise. s/// is not a
> > pattern matcher but a substitution operator. it uses regexes and can be
> > used to parse things.
> >
> > uri
> >

Hi Uri,

Sorry, code is at my office 

The free text can not contain keywords. And keywords start at the
beginning of a line. The list of words is terminated by a ";".

For the pattern matching I have used the option s:
m/pattern/s, to swallow the different \n.

Olivier


-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Parsing file and regexp

2010-02-19 Thread olivier.scalb...@algosyn.com
Uri Guttman wrote:
> > please show your code. there is no way to help otherwise. s/// is not a
> > pattern matcher but a substitution operator. it uses regexes and can be
> > used to parse things.
> >
> > uri
> >

Here it is ...

$ cat test.txt
keyword1 word1, word2
  word3;
  blabla

  blabla


keyword2
  word4, word5,
  word6, word7, word8,
  word9;

  bla bla
  bla bla

keyword1
  word10, word11;


$ cat parse.pl
use warnings;

open FILE, "< test.txt" or die "Could not open $!";
$/ = undef;
$source = ;
close(FILE);


if ($source =~ m/keyword1\s*(\w*)(,\w*)*/s) {
print("Match !\n");
print("$1\n");
print("$2\n");
}

$ perl parse.pl
Match !
word1
,


Here I would like to have 2 matches:
word1, word2
  word3;
and word10, word11;



Thanks to help me !

Olivier




-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/