Dear Bradley,
So basically you're asking others to do your homework for you ?
The only useful purpose your list serves is to demonstrate why people
shouldn't try to build fancy algorithms that rely on an entirely
unreliable datasource.
All you end up with are hacked together algorithms that contain a whole
load of assumptions and will be obsolete by the time you release version
1.0 because people will have changed their naming conventions a million
times.
For example, picking one example from your list ....
<iata>([^a-z]+[a-z]+\d*){3}.ic.ac.uk
ic.ac.uk = Imperial College. A well known and respected ivory towers
institution in the UK. The vast majority of their campus sites are
located in London and only one or to outside London in South East England.
It is therefore very unlikely they'll be using IATA code, infact, last
time I checked they were using conventions such as
hostname.doc.ic.ac.uk, hostname.ch.ic.ac.uk.
Far from being IATA codes, the intermediate subdomains actually refer to
departments (DepartmentOfComputing and CHemistry in the two I quoted).
Sorry to rain on your parade, but someone had to say it.