On 12/29/2009 9:14 AM, Ethan Furman wrote:
Lie Ryan wrote:
On 12/28/2009 11:59 PM, Shawn Milochik wrote:
With address data:
one address may have suite data and the other might not
the same city may have multiple zip codes

why is that even a problem? You do put suite data and zipcode into
different database fields right?

The issue here is not proper database design, the issue is users -- not
one user, not two users, but millions of users, with no consistency
amongst them. They bring you their nice tiny list of 10,000 names and
addresses and want you to correct/normalize/mail them, and you have to
be able to break down what they gave you into something usable.

To rephrase, the issue that Shawn is referring to is the huge amount of
data *already out there*, not brand new data.

~Ethan~

The way Shawn describes the problem appears like he has problem with "searching the database". I said, given a good index searching should never be the problem. To make the "good index", you've got a slightly different but easier problem to tackle: parsing. I realize that parsing inconsistent data isn't as easy as talking about it; but it's generally much easier than trying to fuzzily grep through the raw data.
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to