On Dec 3, 12:45 pm, Bevan Jenkins <[EMAIL PROTECTED]> wrote: > Hello, > > I have recently discovered the python language and am having a lot of > fun getting head around the basics of it. > However, I have run into a stumbling block that I have not been able > to overcome, so I thought I would ask for help. > <Overview> > I am trying to import a text file that has the following format: > 02/01/2000 @ 00:00:00 0.983896 Q10 T2 > 03/01/2000 @ 00:00:00 0.557377 Q10 T2 > 04/01/2000 @ 00:00:00 0.508871 Q10 T2 > 05/01/2000 @ 00:00:00 0.583196 Q10 T2 > 06/01/2000 @ 00:00:00 0.518281 Q10 T2 > when there is missing data: > 12/09/2000 @ 00:00:00 Q151 T2 > 13/09/2000 @ 00:00:00 Q151 T2 > > I have cobbled together some code which imports the data. The next > step is to create an array in which each column contains a years worth > of values. Thus, if i have 6 years of data (2001-2006 inclusive), > there will be six columns, with 365 rows (not all years have a full > data set and may only have say 340 days of data. > <The question> > In the code below > print answer[j,1] is giving me the right answer but i can't write it > to an array. > any suggestions welcomed. > > This is what I have: > flow=[] > flowdate=[] > yeardate=[] > uniqueyear=[] > #flow_order= > flow_rank=[] > icount=[] > p=[] > > filename=r"C:\Documents and Settings\bevanj\Desktop\flow_duration.tsf" > linesep ="\n" > > # read in whole file > tempdata = open( filename).read() > # break into lines > tempdata = string.split( tempdata, linesep ) > # for each record, get the field values > for i in range( len( tempdata)): > # split into the lines > fields = string.split( tempdata[i]) > if len(fields)>5: > flowdate.append(fields[0]) > list =string.split(fields[0],"/") > yeardate.append(list[2]) > flow.append(float(fields[3])) > answer=column_stack((flowdate,flow)) > > for rows in yeardate: > if rows not in uniqueyear: > uniqueyear.append(rows) > > #print answer[:,0] #date > flow_order=empty((0,0),dtype=float) > #for yr in enumerate(uniqueyear): > for iyr,yr in enumerate(uniqueyear): > for j, val, in enumerate (answer[:,0]): > flowyr=string.split(val,"/") > if int(flowyr[2])==int(yr): > print answer[j,1] > #flow_order =
I'm not sure what you mean by `write it to an array'. `answers' is an array. Perhaps you could show an example that has the bad behavior you are observing. Or at least an example of what you expect to get. Also, just a couple of pointers: this: > tempdata = open( filename).read() > # break into lines > tempdata = string.split( tempdata, linesep ) > # for each record, get the field values > for i in range( len( tempdata)): > # split into the lines > fields = string.split( tempdata[i]) is better written (and usually written) in python like this: for line in open(filename): fields = line.split() Don't use the string module, use the methods of the strings themselves. Don't use built-in type names as variable names, as seen on this line: > list =string.split(fields[0],"/") # list is a built-in type You only need to use enumerate if you actually want the index. If you don't need the index, just iterate over the sequence. eg. use this: > for yr in uniqueyear: You don't need to re-create the column-stack each time you get a value from the file. It is very inefficient. eg. this: > for i in range( len( tempdata)): > # split into the lines > fields = string.split( tempdata[i]) > if len(fields)>5: > flowdate.append(fields[0]) > list =string.split(fields[0],"/") > yeardate.append(list[2]) > flow.append(float(fields[3])) > answer=column_stack((flowdate,flow)) to this: > for i in range( len( tempdata)): > # split into the lines > fields = string.split( tempdata[i]) > if len(fields)>5: > flowdate.append(fields[0]) > list =string.split(fields[0],"/") > yeardate.append(list[2]) > flow.append(float(fields[3])) > answer=column_stack((flowdate,flow)) or, with the other suggested changes: > for line in open(filename): > # split into the lines > fields = line.split() > if len(fields) > 5: > flowdate.append(fields[0]) > year = fields[0].split("/")[2] > yeardate.append(year) > flow.append(float(fields[3])) > answer=column_stack((flowdate,flow)) If I was doing this though, I would use a dictionary (dict) where the keys are the year and the values are lists of flows for that year. Something like this: [code] filename=r"C:\Documents and Settings\bevanj\Desktop\flow_duration.tsf" year2flows = {} fin = open(filename) for line in fin: # split into the lines fields = line.split() if len(fields)>5: date = fields[0] year = fields[0].split("/")[-1] flow = float(fields[3]) year2flows.setdefault(year, []).append((date, flow)) fin.close() # This does what you were doing. for yr in sorted(year2flows.keys()): for date, flow in year2flows[yr] print flow # If you just wanted one year though you could do something like this: for date, flow in year2flows[2004]: print flow [/code] The above code is untested, so I make no guarantees. If you are using python 2.5, you might look into using defaultdict (in the collections module). It will simplify the code a bit. from this: year2flows = {} # bunch of stuff... year2flows.setdefault(year, []).append((date, flow)) to this: from collections import defaultdict year2flows = defaultdict(list) # bunch of stuff... year2flows[year].append((date, flow)) Matt -- http://mail.python.org/mailman/listinfo/python-list