On Sunday, 26 January 2014 19:40:26 UTC-5, Steven D'Aprano wrote: > On Sun, 26 Jan 2014 13:46:21 -0800, matt.s.marotta wrote: > > > > > I have been working on a python script that separates mailing addresses > > > into different components. > > > > > > Here is my code: > > > > > > inFile = "directory" > > > outFile = "directory" > > > inHandler = open(inFile, 'r') > > > outHandler = open(outFile, 'w') > > > > Are you *really* opening the same file for reading and writing at the > > same time? > > > > Even if your operating system allows that, surely it's not a good idea. > > You might get away with it for small files, but at some point you're > > going to run into weird, hard-to-diagnose bugs. > > > > > > > outHandler.write("FarmID\tAddress\tStreetNum\tStreetName\tSufType\tDir > > \tCity\tProvince\tPostalCode") > > > > This looks like a CSV file using tabs as the separator. You really ought > > to use the csv module. > > > > http://docs.python.org/3/library/csv.html > > http://docs.python.org/2/library/csv.html > > > > http://pymotw.com/2/csv/ > > > > > > > for line in inHandler: > > > str = line.replace("FarmID\tAddress", " ") > > > outHandler.write(str[0:-1]) > > > str = str.replace(" ","\t", 1) > > > str = str.replace(" Rd,","\tRd\t\t") > > > str = str.replace(" Rd","\tRd\t") > > > str = str.replace("Ave,","\tAve\t\t") > > > str = str.replace("Ave","\tAve\t\t") > > > str = str.replace("St ","\tSt\t\t") > > > str = str.replace("St,","\tSt\t\t") > > > str = str.replace("Dr,","\tDr\t\t") > > [snip additional string manipulations] > > > str = str.replace(",","\t") > > > str = str.replace(" ON","ON\t") > > > outHandler.write(str) > > > > > > Aiy aiy aiy, what a mess! I get a headache just trying to understand it! > > > > The first question that comes to mind is that you appear to be writing > > each input line *twice*, first after a very minimal set of string > > manipulations (you convert the literal string "FarmID\tAddress" to a > > space, then write the whole line out), the second time after a whole mess > > of string replacements. Why? > > > > If the sample data you show below is accurate, I *think* what you are > > trying to do is simply suppress the header line. The first line in the > > input file is: > > > > FarmID Address > > > > and rather than write that you want to write a space. I don't know why > > you want the output file to begin with a space, but this would be better: > > > > for line in inHandler: > > line = line.strip() # Remove any leading and trailing whitespace, > > # including the trailing newline. Later, we'll add a newline > > # back in. > > if line == "FarmID\tAddress": > > outHandler.write(" ") # Write a mysterious space. > > continue # And skip to the next line. > > # Now process the non-header lines. > > > > > > Now, as far as the non-header lines, you do a whole lot of complex string > > manipulations, replacing chunks of text with or without tabs or commas to > > the same text with or without tabs but in a different order. The logic of > > these manipulations completely escape me: what are you actually trying to > > do here? > > > > I *strongly* suggest that you don't try to implement your program logic > > in the form of string manipulations. According to your sample data, your > > data looks like this: > > > > 1 1067 Niagara Stone Rd, Niagara-On-The-Lake, ON L0S 1J0 > > > > i.e. > > > > farmId TAB address COMMA district COMMA postcode > > > > It is much better to pull the line apart into named components, > > manipulate the components directly, then put it back together in the > > order you want. This makes the code more understandable, and easier to > > change if you ever need to change things. > > > > for line in inHandler: > > line = line.strip() > > if line == "FarmID\tAddress": > > outHandler.write(" ") # Write a mysterious space. > > continue > > # Now process the non-header lines. > > farmid, address = line.split("\t") > > farmid = farmid.strip() > > address, district, postcode = address.split(",") > > address = address.strip() > > district = district.strip() > > postcode = postcode.strip() > > # Now process the fields however you like. > > parts_of_address = address.split(" ") > > street_number = parts_of_address[0] # first part > > street_type = parts_of_address[-1] # last part > > street_name = parts_of_address[1:-1] # everything else > > street_name = " ".join(street_name) > > > > and so on for the post code. Then, at the very end, assemble the parts > > you want to write out, join them with tabs, and write: > > > > fields = [farmid, street_number, street_name, street_type, ... ] > > outHandler.write("\t".join(fields)) > > outHandler.write("\n") > > > > > > Or use the csv module to do the actual writing. It will handle escaping > > anything that needs escaping, newlines, tabs, etc. > > > > > > > > -- > > Steven
I`m not reading and writing to the same file, I just changed the actual paths to directory. This is for a school assignment, and we haven`t been taught any of the stuff you`re talking about. Although I appreciate your help, everything needs to stay as is and I just need to create the loop to get rid of the farmID from the end of the postal codes. -- https://mail.python.org/mailman/listinfo/python-list