On Jun 5, 2009, at 8:56 AM, Daniel Jomphe wrote:
> I need to generate a list of all possible American zipcodes, MD5- > digested. Later on, I will need to do much more involving stuff, > processor-wize, with this. But already, generating a naive list of all > possible zipcodes is taking quite a deal of time: Your code, stylistically, looks fine to me, though I'm not that seasoned. I had run into an issue with performance with my file duplicate finder which I think ultimately is Java's fault because the built-in MD5 function isn't very fast. I would give you two pointers: 1. Use Fast MD5: <http://www.twmacinta.com/myjava/fast_md5.php> This made the biggest difference with my file dup finder. I still think there are performance gains on the table but I don't know what they are. 2. Winnow out invalid zip codes. Not every number between 00000-0000 and 99999-9999 are valid zip codes. It might be better just to get a list of valid zip codes from somewhere else, because I'm not sure what the rules are exactly. I found this list by googling around: <http://www.census.gov/tiger/tms/gazetteer/zips.txt> I'm not sure if you can get a list of zip+4's as easily, but at 29,470 records, you're saving yourself 70,592 * 1000 MD5 calculations by just starting with this list. Plus this list gives you some auxiliary data that can't be inferred from the number by itself, like state and county and the lat/lon coordinates. Hope that helps, — Daniel Lyons http://www.storytotell.org -- Tell It! --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~----------~----~----~----~------~----~------~--~---