Hi,

What about this solution:

use warnings;
use strict;

my $str = ' chr1    ucsc    exon    226488874       226488906       0.000000
-       .       gene_id "NM_173083"; transcript_id "NM_173083";
chr1    ucsc    exon    226496810       226497198       0.000000
-       .       gene_id "NM_173083"; transcript_id "NM_173083";
chr1    ucsc    exon    2005086 2005368 0.000000        +       .
gene_id "NM_001033581"; transcript_id "NM_001033581";
chr1    ucsc    exon    2066701 2066786 0.000000        +       .
gene_id "NM_001033581"; transcript_id "NM_001033581";';

my @patterns = map {/(NM_\d+)"/; $1} grep(/NM_\d+"/, split(/\n+/, $str));
my $additional = 12345;
foreach (@patterns) {
    $str =~ s/($_)\"/$1:$additional\"/g and $additional++;
}
print "$str\n";


Regards,
Katya

-----Original Message-----
From: Richard Green [mailto:gree...@uw.edu] 
Sent: Saturday, February 26, 2011 10:07 PM
To: beginners@perl.org
Subject: string substitution command question

Hi Perl users, Quick question, I have a one long string with tab delimited
values separated by a newline character (in rows)
Here is a snippet of the the string:

chr1    ucsc    exon    226488874       226488906       0.000000
-       .       gene_id "NM_173083"; transcript_id "NM_173083";
chr1    ucsc    exon    226496810       226497198       0.000000
-       .       gene_id "NM_173083"; transcript_id "NM_173083";
chr1    ucsc    exon    2005086 2005368 0.000000        +       .
gene_id "NM_001033581"; transcript_id "NM_001033581";
chr1    ucsc    exon    2066701 2066786 0.000000        +       .
gene_id "NM_001033581"; transcript_id "NM_001033581";

I am trying to perform substitution on some values at the end of each rows,
for example, I'm trying to replace the above string with the following:

chr1    ucsc    exon    226488874       226488906       0.000000
-       .       gene_id "NM_173083:12345"; transcript_id "NM_173083:12345";
chr1    ucsc    exon    226496810       226497198       0.000000
-       .       gene_id "NM_173083:12345"; transcript_id "NM_173083:12345";
chr1    ucsc    exon    2005086 2005368 0.000000        +       .
gene_id "NM_001033581:12346"; transcript_id "NM_001033581:12346";
chr1    ucsc    exon    2066701 2066786 0.000000        +       .
gene_id "NM_001033581:12346"; transcript_id "NM_001033581:12346";

Here is the substitution command I am trying to use:

$data_string=~ s/$gene_id\"NM_173083\"\; transcript_id
\"NM_173083\"\;/\"NM_173083:12345\"\; \"NM_173083:12345\"\;/g;

$data_string=~ s/$gene_id\"NM_001033581\"\; transcript_id
\"NM_001033581\"\;/\"NM_001033581:12346\"\; \"NM_001033581:12346\"\;/g;

I don't know why I am not able to substitute at the end of each row in the
string.
Any suggestions folks have are muchly appreciated. Thanks -Rich

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to