ATOM records have fixed format so you can (and should) use string slicing instead, like so (one-liner)
serial, aname, altloc, resn, chid, resi, insCode, x, y, z, occ, b, element, q = line[6:11], line[12:16], line[16], line[17:20], line[21], line[22:26], line[26], line[30:38], line[38:46], line[46:54], line[54:60], line[60:66], line[76:78], line[78:80] or with more explicit typing serial, aname, altloc, resn, chid, resi, insCode, x, y, z, occ, b, element, q = int(line[6:11]), line[12:16].strip(), line[16].strip(), line[17:20].strip(), line[21], int(line[22:26]), line[26], float(line[30:38]), float(line[38:46]), float(line[46:54]), float(line[54:60]), float(line[60:66]), line[76:78].strip(), line[78:80] Cheers, Ed. On Thu, 2013-06-06 at 04:37 +0000, GRANT MILLS wrote: > Dear CCP4BB, > > I'm trying to write a simple python script to retrieve and manipulate > PDB data using the following code: > > #for line in open("PDBfile.pdb"): > # if "ATOM" in line: > # column=line.split() > # c4=column[4] > > and then writing to a new document with: > > #with open("selection.pdb", "a") as myfile: > # myfile.write(c4+"\n") > > Except for if the PDB contains columns which run together such as the > occupancy and B-factor in the following: > > ATOM 608 SG CYS A 47 12.866 -28.741 -1.611 1.00201.10 > S > ATOM 609 OXT CYS A 47 14.622 -24.151 -1.842 1.00100.24 > O > > My script seems to miscount the columns and read the two as one > column, does anyone know how to avoid this? (PS, I've googled this > like crazy but I either don't understand or the link is irrelevant) > > Any advice would help. > Thanks for your time, > Grant > -- Edwin Pozharski, PhD, Assistant Professor University of Maryland, Baltimore ---------------------------------------------- When the Way is forgotten duty and justice appear; Then knowledge and wisdom are born along with hypocrisy. When harmonious relationships dissolve then respect and devotion arise; When a nation falls to chaos then loyalty and patriotism are born. ------------------------------ / Lao Tse /