In article <mailman.4023.1364751102.2939.python-l...@python.org>, Dave Angel <da...@davea.name> wrote:
> On 03/31/2013 12:52 PM, C.T. wrote: > > On Sunday, March 31, 2013 12:20:25 PM UTC-4, zipher wrote: > >> <SNIP> > >> > > > > Thank you, Mark! My problem is the data isn't consistently ordered. I can > > use slicing and indexing to put the year into a tuple, but because a car > > manufacturer could have two names (ie, Aston Martin) or a car model could > > have two names(ie, Iron Duke), its harder to use slicing and indexing for > > those two. I've added the following, but the output is still not what I > > need it to be. > > So the correct answer is "it cannot be done," and an explanation. > > Many times I've been given impossible conditions for a problem. And > invariably the correct solution is to press [back] on the supplier of the > constraints. In real life, you often have to deal with crappy input data (and bogus project requirements). Sometimes you just need to be creative. There's only a small set of car manufacturers. A good start would be mining wikipedia's [[List of automobile manufacturers]]. Once you've got that list, you could try matching portions of the input against the list. Depending on how much effort you wanted to put into this, you could explore all sorts of fuzzy matching (ie "delorean" vs "delorean motor company"), but even a simple search is better than giving up. And, this is a good excuse to explore some of the interesting third-party modules. For example, mwclient ("pip install mwclient") gives you a neat Python interface to wikipedia. And there's a whole landscape of string matching packages to explore. We deal with this every day at Songza. Are Kesha and Ke$ha the same artist? Pushing back on the record labels to clean up their catalogs isn't going to get us very far. -- http://mail.python.org/mailman/listinfo/python-list