hi john, thanks a lot!! the code worked...:-)
On 5/9/05, John Doe <[EMAIL PROTECTED]> wrote: > > Am Montag, 9. Mai 2005 18.52 schrieb Aditi Gupta: > > hi, > > the fields can not be splitted using /s because some fields have common > > boundaries, i.e. some fields are from column 31-38 and the next field > > starts from 39.. As in > > COLUMNS DATA TYPE FIELD DEFINITION > > > --------------------------------------------------------------------------- > >------ 1 - 6 Record name "ATOM " > > > > 7 - 11 Integer serial Atom serial number. > > > > these are the 1st two fields. > > Then use something like [untested] > > foreach (@lines) { > # is the record relevant? > next unless /^ATOM\s*\d+\s*CA/o; > # extract three values from fixed positions: > my ($x, $y, $z)=/^.{27}(.{8})(.{8})(.{8})/o; > } > > joe > > > > On 5/9/05, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > > > ----- Original Message ----- > > > From: Aditi Gupta <[EMAIL PROTECTED]> > > > Date: Monday, May 9, 2005 11:41 am > > > Subject: extracting coordinates > > > > > > > Hi everyone, > > > > > > Hello Aditi, > > > > > > > That code is working... > > > > But my specific problem is as follows: > > > > > > > > i have a file in which data is stored as > > > > > > > > HELIX 4 4 VAL 74 LEU 84 1 11 > > > > CRYST1 33.020 33.750 75.670 90.00 90.00 90.00 P 21 21 21 4 > > > > ORIGX1 1.000000 0.000000 0.000000 0.00000 > > > > ORIGX2 0.000000 1.000000 0.000000 0.00000 > > > > ORIGX3 0.000000 0.000000 1.000000 0.00000 > > > > SCALE1 0.030285 0.000000 0.000000 0.00000 > > > > SCALE2 0.000000 0.029630 0.000000 0.00000 > > > > SCALE3 0.000000 0.000000 0.013215 0.00000 > > > > ATOM 1 N LEU 2 -10.586 -14.055 54.397 1.00 49.37 N > > > > ATOM 2 CA LEU 2 -9.711 -13.341 53.419 1.00 48.40 C > > > > ATOM 3 C LEU 2 -10.401 -12.068 52.928 1.00 46.56 C > > > > ATOM 4 O LEU 2 -11.440 -12.138 52.267 1.00 47.05 O > > > > ATOM 5 CB LEU 2 -9.417 -14.253 52.223 1.00 51.90 C > > > > ATOM 6 CG LEU 2 -7.974 -14.441 51.748 1.00 54.45 C > > > > ATOM 7 CD1 LEU 2 -7.365 -13.109 51.342 1.00 53.43 C > > > > ATOM 8 CD2 LEU 2 -7.160 -15.095 52.852 1.00 55.22 C > > > > ATOM 9 N THR 3 -9.833 -10.909 53.259 1.00 42.49 N > > > > ATOM 10 CA THR 3 -10.405 -9.634 52.826 1.00 40.93 C > > > > ATOM 11 C THR 3 -10.060 -9.403 51.362 1.00 41.24 C > > > > > > > > > > > > > > > > the fields of records having ATOM as 1st field are as follows: > > > > > > > > COLUMNS DATA TYPE FIELD DEFINITION > > > > ------------------------------------------------------------------- > > > > -------------- > > > > 1 - 6 Record name "ATOM " > > > > > > > > 7 - 11 Integer serial Atom serial number. > > > > > > > > 13 - 16 Atom name Atom name. > > > > > > > > 17 Character altLoc Alternate location > > > > indicator. > > > > 18 - 20 Residue name resName Residue name. > > > > > > > > 22 Character chainID Chain identifier. > > > > > > > > 23 - 26 Integer resSeq Residue sequence number. > > > > > > > > 27 AChar iCode Code for insertion of > > > > residues. > > > > 31 - 38 Real(8.3) x Orthogonal > > > > coordinates for X in > > > > Angstroms. > > > > > > > > 39 - 46 Real(8.3) y Orthogonal > > > > coordinates for Y in > > > > Angstroms. > > > > > > > > 47 - 54 Real(8.3) z Orthogonal > > > > coordinates for Z in > > > > Angstroms. > > > > > > > > 55 - 60 Real(6.2) occupancy Occupancy. > > > > > > > > 61 - 66 Real(6.2) tempFactor Temperature factor. > > > > > > > > 73 - 76 LString(4) segID Segment identifier, > > > > left-justified. > > > > > > > > 77 - 78 LString(2) element Element symbol, right- > > > > justified. > > > > 79 - 80 LString(2) charge Charge on the atom. > > > > > > > > > > > > > > > > I have to get the x,y,z coordinates of records whose atom name is > > > > 'CA'(highlighted as blue). > > > > > > > > I wrote a code but its giving many errors.. > > > > > > > > The code is: > > > > > > > > > > > > > > > > #!usr/bin/perl > > > > use warnings; > > > > > > > > $filename = "1a32.txt"; > > > > chomp $filename; > > > > > > The above line is useless, perldoc -f chomp > > > > > > > open (FILEHANDLE, "$filename") or die "couldn't open $filename:$!"; > > > > @file= <FILEHANDLE>; > > > > close (FILEHANDLE); > > > > > > > > $a= "ATOM"; > > > > $c= "CA"; > > > > > > > > foreach $line(@file) > > > > { > > > > if(my $line =~ /^/$a/\s* > > > > (\s*\d+) > > > > \s*/$c/\s* > > > > \d* > > > > \w+ > > > > \s > > > > \w > > > > (\s*\d+) > > > > \w* > > > > (\s*\d*) > > > > (\s*\d*) > > > > (\s*\d*) > > > > (\s*\d*) > > > > (\s*\d*) > > > > (\w*\s*) > > > > (\s*\w*) > > > > (\s*\w*)/) > > > > > > Youch, that is way to long of a regular expression [ atleast for me ], > > > you may consider a shorter nested version such as my @fields =~ > > > /([\w+\s+])/g. In any case I think split would work the best here [ my > > > @fields = split /\s/,$line ], since your fields are locked into place. > In > > > general you PAD data, untill you get it all uniformed such as yours. > > > Below is some simple code that should help you on your way, feel free > to > > > modify at will. > > > > > > > { > > > > my $x= substr($line,30,8); > > > > my $y= substr($line,38,8); > > > > my $z= substr($line,46,8); > > > > > > > > print "$x\t$y\t$z\n"; > > > > } > > > > } > > > > > > > > #------------------------------------------------- > > > > > > > > > > > > > > > > The errors that i'm getting are: > > > > > > > > Scalar found where operator expected at two.pl line 15, near "/^/$a" > > > > (Missing operator before $a?) > > > > Unrecognized escape \d passed through at two.pl line 15. > > > > Unrecognized escape \s passed through at two.pl line 15. > > > > Unrecognized escape \w passed through at two.pl line 17. > > > > Unrecognized escape \s passed through at two.pl line 17. > > > > Unrecognized escape \w passed through at two.pl line 17. > > > > Unrecognized escape \s passed through at two.pl line 17. > > > > Backslash found where operator expected at two.pl line 22, near > > > > "(\s*\" (Might be a runaway multi-line ** string starting on line > 17) > > > > (Missing operator before \?) > > > > Unquoted string "d" may clash with future reserved word at two.pl > > > > line 22. > > > > Backslash found where operator expected at two.pl line 23, near ") > > > > \" > > > > (Missing operator before \?) > > > > Unquoted string "w" may clash with future reserved word at two.pl > > > > line 23. > > > > Unrecognized escape \s passed through at two.pl line 24. > > > > Backslash found where operator expected at two.pl line 25, near > > > > "(\s*\" (Might be a runaway multi-line ** string starting on line > 24) > > > > (Missing operator before \?) > > > > Unquoted string "d" may clash with future reserved word at two.pl > > > > line 25. > > > > Unrecognized escape \s passed through at two.pl line 26. > > > > Backslash found where operator expected at two.pl line 27, near > > > > "(\s*\" (Might be a runaway multi-line ** string starting on line > 26) > > > > (Missing operator before \?) > > > > Unquoted string "d" may clash with future reserved word at two.pl > > > > line 27. > > > > Unrecognized escape \w passed through at two.pl line 28. > > > > Backslash found where operator expected at two.pl line 29, near > > > > "(\w*\" (Might be a runaway multi-line ** string starting on line > 28) > > > > (Missing operator before \?) > > > > Unrecognized escape \w passed through at two.pl line 29. > > > > syntax error at two.pl line 15, near "/^/$a" > > > > Substitution replacement not terminated at two.pl line 31. > > > > > > > > Please help me.. > > > > > > #!usr/bin/perl > > > use warnings; > > > use strict; > > > my $filename = "1a32.txt"; > > > my $Atom='CA'; > > > > > > open (FILEHANDLE, "$filename") or die "couldn't open $filename:$!"; > > > @file= <FILEHANDLE>; > > > close (FILEHANDLE); > > > > > > foreach my $line ( @file ){ > > > > > > my @fields = split /\s/,$line; > > > print "X: $fields[-4] Y: $fields[-3] Z: $fields[-2]\n" if uc > $fields[2] > > > eq '$Atom; > > > > > > } > > > > > > HTH, > > > Mark G. >