On Tue, 21 Jan 2014 16:06:56 -0800, Shane Konings wrote:

> The following is a sample of the data. There are hundreds of lines that
> need to have an automated process of splitting the strings into headings
> to be imported into excel with theses headings
> 
> ID  Address  StreetNum  StreetName  SufType  Dir   City  Province 
> PostalCode

Ok, the following general method seems to work:

First, use a regex to capture two numeric groups and the rest of the line 
separated by whitespace. If you can't find all three fields, you have 
unexpected data format.

re.search( r"(\d+)\s+(\d+)\s+(.*)", data )

Second, split the rest of the line on a regex of comma + 0 or more 
whitespace.

re.split( r",\s+", data )

Check that the rest of the line has 3 or 4 bits, otherwise you have an 
unexpected lack or excess of data fields.

Split the first bit of the rest of the line into street name and suffix/
type. If you can't split it, use it as the street name and set the suffix/
type to blank.

re.search( r"(.*)\s+(\w+)", data )

If there are 3 bits in rest of line, set direction to blank, otherwise 
set direction to the second bit.

Set the city to the last but one bit of the rest of the line.

Capture one word followed by two words in the last bit of the rest of the 
line, and use these as the province and postcode.

re.search( r"(\w+)\s+(\w+\s+\w+)", data )

Providing none of the searches or the split errored, you should now have 
the data fields you need to write. The easiest way to write them might be 
to assemble them as a list and use the csv module.

I'm assuming you're capable of working out from the help on the python re 
module what to use for each data, and how to access the captured results 
of a search, and the results of a split. I'm also assuming you're capable 
of working out how to use the csv module from the documentation. If 
you're not, then either go back and ask your lecturer for help, or tell 
your boss to hire a real programmer for his quick and easy coding jobs.

-- 
Denis McMahon, denismfmcma...@gmail.com
-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to