So, Python question of the day: I have 2 files that I could normally just read in with asciitable, The first file is a 12 column 8000 row table that I have read in via asciitable and manipulated. The second file is enormous, has over 50,000 rows and about 20 columns. What I want to do is find the best match for (file 1 column 1 and 2) with (file 2 column 4 and 5), return all rows that match from the huge file, join them togeather and save the whole mess as a file with 8000 rows (assuming the smaller table finds one match per row) and 32=12+20 columns. So my read code so far is as follows: ------------------------------------------------- import sys import asciitable import matplotlib import scipy import numpy as np from numpy import * import math import pylab import random from pylab import * import astropysics import astropysics.obstools import astropysics.coords
x=small_file #cannot read blank values (string!) if blank insert -999.99 dat=asciitable.read(x,Reader=asciitable.CommentedHeader, fill_values=['','-999.99']) y=large_file fopen2=open('cfile2match.list','w') dat2=asciitable.read(y,Reader=asciitable.CommentedHeader, fill_values=['','-999.99']) #here are the 2 values for the small file Radeg=dat['ra-drad']*180./math.pi Decdeg=dat['dec-drad']*180./math.pi #here are the 2 values for the large file Radeg2=dat2['ra-drad']*180./math.pi Decdeg2=dat2['dec-drad']*180./math.pi for i in xrange(len(Radeg)): for j in xrange(len(Radeg2)): #select the value if it is very, very, very close if i != j and Radeg[i] <= (Radeg2[j]+0.000001) and Radeg[i] >= (Radeg2[j]-0.000001) and Decdeg[i] <= (Decdeg2[j]+0.000001) and Decdeg[i] >= (Decdeg2[j]-0.000001): fopen.write( " ".join([str(k) for k in list(dat[i])])+" "+" ".join([str(k) for k in list(dat[j])])+"\n") ------------------------------------------- Now this is where I had to stop, this is way, way too long and messy. I did a similar approach with smaller (9000 lines each) files and it worked but took awhile, the problem here is I am going to have to play with the match range to return the best result and give only one (1!) match per row for my smaller file, i.e. row 1 of small file must match only 1 row of large file..... then I just need to return them both. However, it isn't clear to me that this is the best way forward. I have been changing the xrange to low values to play with the matching, but I would appreciate any ideas. Thanks ~Elaina -- PhD Candidate Department of Physics and Astronomy Faculty of Science Macquarie University North Ryde, NSW 2109, Australia
_______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor