On Jun 5, 2009, at 8:56 AM, Daniel Jomphe wrote:

> I need to generate a list of all possible American zipcodes, MD5-
> digested. Later on, I will need to do much more involving stuff,
> processor-wize, with this. But already, generating a naive list of all
> possible zipcodes is taking quite a deal of time:


Your code, stylistically, looks fine to me, though I'm not that  
seasoned. I had run into an issue with performance with my file  
duplicate finder which I think ultimately is Java's fault because the  
built-in MD5 function isn't very fast.

I would give you two pointers:

1. Use Fast MD5:

<http://www.twmacinta.com/myjava/fast_md5.php>

This made the biggest difference with my file dup finder. I still  
think there are performance gains on the table but I don't know what  
they are.

2. Winnow out invalid zip codes. Not every number between 00000-0000  
and 99999-9999 are valid zip codes. It might be better just to get a  
list of valid zip codes from somewhere else, because I'm not sure what  
the rules are exactly. I found this list by googling around:

<http://www.census.gov/tiger/tms/gazetteer/zips.txt>

I'm not sure if you can get a list of zip+4's as easily, but at 29,470  
records, you're saving yourself 70,592 * 1000 MD5 calculations by just  
starting with this list. Plus this list gives you some auxiliary data  
that can't be inferred from the number by itself, like state and  
county and the lat/lon coordinates.

Hope that helps,

—
Daniel Lyons
http://www.storytotell.org -- Tell It!


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to