On Mon, Oct 01, 2012 at 11:15:53PM +0100, Florian Huber wrote: > Dear all,
Hello, > $string = "/NOTNEEDED/*ACGACGGGTTCAAGGCAG*/NOTNEEDED/" I would suggest that you show us the real data. I'm assuming that 'NOTNEEDED' is a placeholder for some data that you're not interested in. Without knowing what that is we can't really say for sure what is going on (though we can speculate; see below). Note that you should be using the strict and warnings pragmas (see below). The lack of 'my' here suggests that you probably aren't. > But when I do > > $string =~ /[ACGT]/; > > it matches only the last letter, i.e. "G". Why doesn't it start > at the beginning? It isn't matching the last letter. You are probably making the wrong assumption. This is common when you're having trouble with code. Again, show us the 'NOTNEEDED' part. :) > But it gets even better, I figured that adding the greedy * > should help: > > $string =~ /[ACGT]*/; > > and now it doesn't match anything. Shouldn't it try to match as > many times as possible? It should match at least the once that you saw earlier (assuming the same data). > My confusion was complete when I tried > > $string =~ /[ACGT]{5}/; > > now it matches 5 letters, but this time from the beginning, > i.e.: ACGAC. I'm guessing that the first 'NOTNEEDED' contains a 'G'. That would explain the first match. The second result is nonesense with the data we've seen. :-/ If 'NOTNEEDED' doesn't contain a string at least 5 characters in length composed only of 'A', 'C', 'G', or 'T' then that would explain this last result. > I fail to understand that behaviour. I checked the Perl > documentation a bit and I sort of understand why /[ACGT]/ only > matches one letter only (but not why it starts at the end). > However, I'm simply puzzled at the other things. As said, provide us with a full (minimal) program to demonstrate the problems you're having if your problems persist. Assuming 'NOTNEEDED' cannot contain '/' characters then you may need to include those in your pattern to make sure you match the parts you want. You will probably want to use captures for that (see perldoc perlre). To understand the below program you will also need to understand the /x modifier (again see perldoc perlre). #!/usr/bin/perl use strict; # <---Make sure you have these. use warnings; # <--/ my $string = '/NOTNEEDED/*ACGACGGGTTCAAGGCAG*/NOTNEEDED/'; my ($match) = $string =~ m, ^ # Beginning of string. / # Skip over the first '/'. [^/]* # Skip over anything that's not a '/'. / # Until the next '/'. Skip over that too. \* # Skip over the literal '*' character. ([ACGT]+) # Now capture the sequence we want. ,x; print $match, "\n"; __END__ Output: ACGACGGGTTCAAGGCAG IF the '*' characters literally delimit the parts that you want (AND not the parts that you don't want) then that's even easier: #!/usr/bin/perl use strict; use warnings; my $string = '/NOTNEEDED/*ACGACGGGTTCAAGGCAG*/NOTNEEDED/'; my ($match) = $string =~ /\*([ACGT]+)/; print $match, "\n"; __END__ This produces the same output with this sample string. Without seeing the real data it's hard to speculate. There might be a better way. You need to know the specifications of the data you're processing if you want to reliably process it automatically. We need to know this to help you do it too. o o o o A lot of people seem to post about this same type of data. I'd be surprised if nobody has written CPAN modules for parsing the data yet (and if not then perhaps it would be economical to do so). Just saying... Regards, -- Brandon McCaig <bamcc...@gmail.com> <bamcc...@castopulence.org> Castopulence Software <https://www.castopulence.org/> Blog <http://www.bamccaig.com/> perl -E '$_=q{V zrna gur orfg jvgu jung V fnl. }. q{Vg qbrfa'\''g nyjnlf fbhaq gung jnl.}; tr/A-Ma-mN-Zn-z/N-Zn-zA-Ma-m/;say'
signature.asc
Description: Digital signature