Adriano Allora wrote:
hi to all,
Hi Adriano. Read my comments in-line and my solution at the end.
I've got a list of tagged words, like this one (only a little bit longest): <tLn nr=11> e CON e le DET:def il ha VER:pres avere|riavere detto VER:pper dire < NOM <unknown> CORR VER:infi corre > NOM <unknown> e CON e a PRE a
Posting a short example is fine, as long as the full data doesn't contain any records significantly different from any of those in your sample. See later.
I need to transform the list below in (in which the CORR tag isn't tagged): <tLn nr=11> e CON e le DET:def il ha VER:pres avere|riavere detto VER:pper dire <CORR> e CON e a PRE a So I tried to write this awful script:
Don't think I'm not going to tell you it's awful just because you've said so already :)
#!/usr/bin/perl -w use strict;
and always use warnings; as well. It didn't show any problems in this case but it's a useful thing to have around.
$^I = '';
Better not to use this at all while you're testing, as you'll keep overwriting the input data and have to restore it.
my $tic = 0; my $toc = 0;
You're getting confused by having two flags when you need only one: you're either inside a tag or you're not.
while(<>) { if(/^< NOM <unknown>.*/i)
No need for the .* at the end of the regex. And isn't it unnecessarily long? Unless there are other records that look similar in your source data that you don't want to match here I'd say if (/^<\s/) { was adequate. (Starts with a left angle bracket followed by white space.)
{ $tic = 1; next; } next if /^> NOM <unknown>.*/i; next if $toc == 1;
Here is your problem. $toc is getting set inside the following if statement and is never reset, so the loop then cycles unproductively until the end of the input.
$toc = 0; if($tic==1) { s/^(\/?\w+).+/$1/gi;
There's no need for the /i modifier as you don't have any literal letters to match. And there's no need for the /g as you have no /m so ^ can match only once. (The line has only one beginning!) It's also clearer to use different delimiters when you're matching slashes instead of escaping them, like this: s|^(/?\w+).+|$1|gi; It looks like your records can also begin with a slash. You didn't show us any data like that. This is the sort of thing I was talking about earlier - your example data can be a subset of the real thing but it needs to be fully representative. And in the end all you've done is strip off the tail of the line after the first word. As far as I can tell s/\s.*//; is adequate. (Remove everything at and after the first white space.)
chomp();
A bit late to be doing the chomp here isn't it? Put it straight after the while statement and then we know where we are for the entirety oif the loop. And anyway the substitution you just did in the line before will have lopped off the trailing newline anyway. By the way, you don't need the parentheses.
$_ = "<$_>";
It's a misconception to try to turn the actual input record into what the output should be; after all, you didn't say $_='' above when the output should be blank, you simply called next. Simply build the output record that you want and print it. You'll see this in my solution below.
$toc = 1; $tic = 0; } s/<>//g;
What's this for? If it's to fix a bug in your code that's generating '<>' when it shouldn't then you should fix the bug instead. Or if there are records in your source data that need this stripping out then you haven't shown any. Otherwise I don't understand its purpose.
print; } it doesn't return errors, but it stop printing the output after the first correction. Someone can explain me why and eventually suggest how to correct the corrector?
Yep, awful script. Mainly because of unclear thinking I believe. If I was to ask you what states $tic and $toc represented you couldn't easily tell me, and because you don't really know what they do (they are yours after all and you should know what they're for!) you've tried setting and clearing them at odd places and failed to get it right.
Thanks at all, alladr PS: another strange thing: if I declare at the beginning of the script: my($tic,$toc); it returns me an error...
It would have been nice if you had told us the error, but my guess is that it's "variable masks earlier declaration", because you will then have declared them twice. Here's the fix. It has a single flag $tag, and it means the script is processing 'inside' a tag. It's set when a line beginning with a lone '<' is found and cleared when the corresponding '>' appears. HTH, Rob use strict; use warnings; #$^I = ''; my $tag; while (<>) { chomp; if(/^[<>]\s/) { $tag = /^</; } elsif ($tag) { /(^\S+)/; print "<$1>\n"; } else { print "$_\n"; } } -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>