On Sun, 26 Jan 2014 13:46:21 -0800, matt.s.marotta wrote: > I have been working on a python script that separates mailing addresses > into different components. > > Here is my code: > > inFile = "directory" > outFile = "directory" > inHandler = open(inFile, 'r') > outHandler = open(outFile, 'w')
Are you *really* opening the same file for reading and writing at the same time? Even if your operating system allows that, surely it's not a good idea. You might get away with it for small files, but at some point you're going to run into weird, hard-to-diagnose bugs. > outHandler.write("FarmID\tAddress\tStreetNum\tStreetName\tSufType\tDir \tCity\tProvince\tPostalCode") This looks like a CSV file using tabs as the separator. You really ought to use the csv module. http://docs.python.org/3/library/csv.html http://docs.python.org/2/library/csv.html http://pymotw.com/2/csv/ > for line in inHandler: > str = line.replace("FarmID\tAddress", " ") > outHandler.write(str[0:-1]) > str = str.replace(" ","\t", 1) > str = str.replace(" Rd,","\tRd\t\t") > str = str.replace(" Rd","\tRd\t") > str = str.replace("Ave,","\tAve\t\t") > str = str.replace("Ave","\tAve\t\t") > str = str.replace("St ","\tSt\t\t") > str = str.replace("St,","\tSt\t\t") > str = str.replace("Dr,","\tDr\t\t") [snip additional string manipulations] > str = str.replace(",","\t") > str = str.replace(" ON","ON\t") > outHandler.write(str) Aiy aiy aiy, what a mess! I get a headache just trying to understand it! The first question that comes to mind is that you appear to be writing each input line *twice*, first after a very minimal set of string manipulations (you convert the literal string "FarmID\tAddress" to a space, then write the whole line out), the second time after a whole mess of string replacements. Why? If the sample data you show below is accurate, I *think* what you are trying to do is simply suppress the header line. The first line in the input file is: FarmID Address and rather than write that you want to write a space. I don't know why you want the output file to begin with a space, but this would be better: for line in inHandler: line = line.strip() # Remove any leading and trailing whitespace, # including the trailing newline. Later, we'll add a newline # back in. if line == "FarmID\tAddress": outHandler.write(" ") # Write a mysterious space. continue # And skip to the next line. # Now process the non-header lines. Now, as far as the non-header lines, you do a whole lot of complex string manipulations, replacing chunks of text with or without tabs or commas to the same text with or without tabs but in a different order. The logic of these manipulations completely escape me: what are you actually trying to do here? I *strongly* suggest that you don't try to implement your program logic in the form of string manipulations. According to your sample data, your data looks like this: 1 1067 Niagara Stone Rd, Niagara-On-The-Lake, ON L0S 1J0 i.e. farmId TAB address COMMA district COMMA postcode It is much better to pull the line apart into named components, manipulate the components directly, then put it back together in the order you want. This makes the code more understandable, and easier to change if you ever need to change things. for line in inHandler: line = line.strip() if line == "FarmID\tAddress": outHandler.write(" ") # Write a mysterious space. continue # Now process the non-header lines. farmid, address = line.split("\t") farmid = farmid.strip() address, district, postcode = address.split(",") address = address.strip() district = district.strip() postcode = postcode.strip() # Now process the fields however you like. parts_of_address = address.split(" ") street_number = parts_of_address[0] # first part street_type = parts_of_address[-1] # last part street_name = parts_of_address[1:-1] # everything else street_name = " ".join(street_name) and so on for the post code. Then, at the very end, assemble the parts you want to write out, join them with tabs, and write: fields = [farmid, street_number, street_name, street_type, ... ] outHandler.write("\t".join(fields)) outHandler.write("\n") Or use the csv module to do the actual writing. It will handle escaping anything that needs escaping, newlines, tabs, etc. -- Steven -- https://mail.python.org/mailman/listinfo/python-list