Indeed, rainbow tables are helpful for working on unsalted hashes. They turn a large amount of computational work into a bit of computational work and a bit of lookup work. The rainbow tables could easily be captured as RDDs.
I guess I derailed my own discussion by focusing on password cracking, since my intention was to explore how Spark applications are written for compute-intensive workloads as opposed to data intensive ones. And for certain types of password cracking, the best approach is to turn compute work into data work. :) On Thu, Jun 12, 2014 at 5:32 AM, Marek Wiewiorka <marek.wiewio...@gmail.com> wrote: > This actually what I've already mentioned - with rainbow tables kept in > memory it could be really fast! > > Marek > > > 2014-06-12 9:25 GMT+02:00 Michael Cutler <mich...@tumra.com>: > > Hi Nick, >> >> The great thing about any *unsalted* hashes is you can precompute them >> ahead of time, then it is just a lookup to find the password which matches >> the hash in seconds -- always makes for a more exciting demo than "come >> back in a few hours". >> >> It is a no-brainer to write a generator function to create all possible >> passwords from a charset like " >> abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789", hash >> them and store them to lookup later. It is however incredibly wasteful on >> storage space. >> >> - all passwords from 1 to 9 letters long >> - using the charset above = 13,759,005,997,841,642 passwords >> - assuming 20 bytes to store the SHA-1 and up to 9 to store the password >> equals approximately 375.4 Petabytes >> >> Thankfully there is already a more efficient/compact mechanism to achieve >> this using Rainbow Tables <http://en.wikipedia.org/wiki/Rainbow_table> -- >> better still, there is an active community of people who have already >> precomputed many of these datasets already. The above dataset is readily >> available to download and is just 864GB -- much more feasible. >> >> All you need to do then is write a rainbow-table lookup function in Spark >> and leverage the precomputed files stored in HDFS. Done right you should >> be able to achieve interactive (few second) lookups. >> >> Have fun! >> >> MC >> >> >> *Michael Cutler* >> Founder, CTO >> >> >> * Mobile: +44 789 990 7847 Email: mich...@tumra.com <mich...@tumra.com> >> Web: tumra.com >> <http://tumra.com/?utm_source=signature&utm_medium=email> * >> *Visit us at our offices in Chiswick Park <http://goo.gl/maps/abBxq>* >> *Registered in England & Wales, 07916412. VAT No. 130595328 <130595328>* >> >> >> This email and any files transmitted with it are confidential and may >> also be privileged. It is intended only for the person to whom it is >> addressed. If you have received this email in error, please inform the >> sender immediately. If you are not the intended recipient you must not >> use, disclose, copy, print, distribute or rely on this email. >> >> >> On 12 June 2014 01:24, Nick Chammas <nicholas.cham...@gmail.com> wrote: >> >>> Spark is obviously well-suited to crunching massive amounts of data. How >>> about to crunch massive amounts of numbers? >>> >>> A few years ago I put together a little demo for some co-workers to >>> demonstrate the dangers of using SHA1 >>> <http://codahale.com/how-to-safely-store-a-password/> to hash and store >>> passwords. Part of the demo included a live brute-forcing of hashes to show >>> how SHA1's speed made it unsuitable for hashing passwords. >>> >>> I think it would be cool to redo the demo, but utilize the power of a >>> cluster managed by Spark to crunch through hashes even faster. >>> >>> But how would you do that with Spark (if at all)? >>> >>> I'm guessing you would create an RDD that somehow defined the search >>> space you're going to go through, and then partition it to divide the work >>> up equally amongst the cluster's cores. Does that sound right? >>> >>> I wonder if others have already used Spark for computationally-intensive >>> workloads like this, as opposed to just data-intensive ones. >>> >>> Nick >>> >>> >>> ------------------------------ >>> View this message in context: Using Spark to crack passwords >>> <http://apache-spark-user-list.1001560.n3.nabble.com/Using-Spark-to-crack-passwords-tp7437.html> >>> Sent from the Apache Spark User List mailing list archive >>> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com. >>> >> >> >