hi, the fields can not be splitted using /s because some fields have common boundaries, i.e. some fields are from column 31-38 and the next field starts from 39.. As in COLUMNS DATA TYPE FIELD DEFINITION --------------------------------------------------------------------------------- 1 - 6 Record name "ATOM "
7 - 11 Integer serial Atom serial number. these are the 1st two fields. On 5/9/05, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > > > > ----- Original Message ----- > From: Aditi Gupta <[EMAIL PROTECTED]> > Date: Monday, May 9, 2005 11:41 am > Subject: extracting coordinates > > > Hi everyone, > Hello Aditi, > > > > > That code is working... > > But my specific problem is as follows: > > > > i have a file in which data is stored as > > > > HELIX 4 4 VAL 74 LEU 84 1 11 > > CRYST1 33.020 33.750 75.670 90.00 90.00 90.00 P 21 21 21 4 > > ORIGX1 1.000000 0.000000 0.000000 0.00000 > > ORIGX2 0.000000 1.000000 0.000000 0.00000 > > ORIGX3 0.000000 0.000000 1.000000 0.00000 > > SCALE1 0.030285 0.000000 0.000000 0.00000 > > SCALE2 0.000000 0.029630 0.000000 0.00000 > > SCALE3 0.000000 0.000000 0.013215 0.00000 > > ATOM 1 N LEU 2 -10.586 -14.055 54.397 1.00 49.37 N > > ATOM 2 CA LEU 2 -9.711 -13.341 53.419 1.00 48.40 C > > ATOM 3 C LEU 2 -10.401 -12.068 52.928 1.00 46.56 C > > ATOM 4 O LEU 2 -11.440 -12.138 52.267 1.00 47.05 O > > ATOM 5 CB LEU 2 -9.417 -14.253 52.223 1.00 51.90 C > > ATOM 6 CG LEU 2 -7.974 -14.441 51.748 1.00 54.45 C > > ATOM 7 CD1 LEU 2 -7.365 -13.109 51.342 1.00 53.43 C > > ATOM 8 CD2 LEU 2 -7.160 -15.095 52.852 1.00 55.22 C > > ATOM 9 N THR 3 -9.833 -10.909 53.259 1.00 42.49 N > > ATOM 10 CA THR 3 -10.405 -9.634 52.826 1.00 40.93 C > > ATOM 11 C THR 3 -10.060 -9.403 51.362 1.00 41.24 C > > > > > > > > the fields of records having ATOM as 1st field are as follows: > > > > COLUMNS DATA TYPE FIELD DEFINITION > > ------------------------------------------------------------------- > > -------------- > > 1 - 6 Record name "ATOM " > > > > 7 - 11 Integer serial Atom serial number. > > > > 13 - 16 Atom name Atom name. > > > > 17 Character altLoc Alternate location > > indicator. > > 18 - 20 Residue name resName Residue name. > > > > 22 Character chainID Chain identifier. > > > > 23 - 26 Integer resSeq Residue sequence number. > > > > 27 AChar iCode Code for insertion of > > residues. > > 31 - 38 Real(8.3) x Orthogonal > > coordinates for X in > > Angstroms. > > > > 39 - 46 Real(8.3) y Orthogonal > > coordinates for Y in > > Angstroms. > > > > 47 - 54 Real(8.3) z Orthogonal > > coordinates for Z in > > Angstroms. > > > > 55 - 60 Real(6.2) occupancy Occupancy. > > > > 61 - 66 Real(6.2) tempFactor Temperature factor. > > > > 73 - 76 LString(4) segID Segment identifier, > > left-justified. > > > > 77 - 78 LString(2) element Element symbol, right- > > justified. > > 79 - 80 LString(2) charge Charge on the atom. > > > > > > > > I have to get the x,y,z coordinates of records whose atom name is > > 'CA'(highlighted as blue). > > > > I wrote a code but its giving many errors.. > > > > The code is: > > > > > > > > #!usr/bin/perl > > use warnings; > > > > $filename = "1a32.txt"; > > chomp $filename; > > The above line is useless, perldoc -f chomp > > > open (FILEHANDLE, "$filename") or die "couldn't open $filename:$!"; > > @file= <FILEHANDLE>; > > close (FILEHANDLE); > > > > $a= "ATOM"; > > $c= "CA"; > > > > foreach $line(@file) > > { > > if(my $line =~ /^/$a/\s* > > (\s*\d+) > > \s*/$c/\s* > > \d* > > \w+ > > \s > > \w > > (\s*\d+) > > \w* > > (\s*\d*) > > (\s*\d*) > > (\s*\d*) > > (\s*\d*) > > (\s*\d*) > > (\w*\s*) > > (\s*\w*) > > (\s*\w*)/) > > Youch, that is way to long of a regular expression [ atleast for me ], you > may consider a shorter nested version such as my @fields =~ /([\w+\s+])/g. > In any case I think split would work the best here [ my @fields = split > /\s/,$line ], since your fields are locked into place. In general you PAD > data, untill you get it all uniformed such as yours. Below is some simple > code that should help you on your way, feel free to modify at will. > > > > > { > > my $x= substr($line,30,8); > > my $y= substr($line,38,8); > > my $z= substr($line,46,8); > > > > print "$x\t$y\t$z\n"; > > } > > } > > > > #------------------------------------------------- > > > > > > > > The errors that i'm getting are: > > > > Scalar found where operator expected at two.pl line 15, near "/^/$a" > > (Missing operator before $a?) > > Unrecognized escape \d passed through at two.pl line 15. > > Unrecognized escape \s passed through at two.pl line 15. > > Unrecognized escape \w passed through at two.pl line 17. > > Unrecognized escape \s passed through at two.pl line 17. > > Unrecognized escape \w passed through at two.pl line 17. > > Unrecognized escape \s passed through at two.pl line 17. > > Backslash found where operator expected at two.pl line 22, near > > "(\s*\" (Might be a runaway multi-line ** string starting on line 17) > > (Missing operator before \?) > > Unquoted string "d" may clash with future reserved word at two.pl > > line 22. > > Backslash found where operator expected at two.pl line 23, near ") > > \" > > (Missing operator before \?) > > Unquoted string "w" may clash with future reserved word at two.pl > > line 23. > > Unrecognized escape \s passed through at two.pl line 24. > > Backslash found where operator expected at two.pl line 25, near > > "(\s*\" (Might be a runaway multi-line ** string starting on line 24) > > (Missing operator before \?) > > Unquoted string "d" may clash with future reserved word at two.pl > > line 25. > > Unrecognized escape \s passed through at two.pl line 26. > > Backslash found where operator expected at two.pl line 27, near > > "(\s*\" (Might be a runaway multi-line ** string starting on line 26) > > (Missing operator before \?) > > Unquoted string "d" may clash with future reserved word at two.pl > > line 27. > > Unrecognized escape \w passed through at two.pl line 28. > > Backslash found where operator expected at two.pl line 29, near > > "(\w*\" (Might be a runaway multi-line ** string starting on line 28) > > (Missing operator before \?) > > Unrecognized escape \w passed through at two.pl line 29. > > syntax error at two.pl line 15, near "/^/$a" > > Substitution replacement not terminated at two.pl line 31. > > > > Please help me.. > > > > #!usr/bin/perl > use warnings; > use strict; > my $filename = "1a32.txt"; > my $Atom='CA'; > > open (FILEHANDLE, "$filename") or die "couldn't open $filename:$!"; > @file= <FILEHANDLE>; > close (FILEHANDLE); > > foreach my $line ( @file ){ > > my @fields = split /\s/,$line; > print "X: $fields[-4] Y: $fields[-3] Z: $fields[-2]\n" if uc $fields[2] eq > '$Atom; > > } > > HTH, > Mark G. > >