Indeed, rainbow tables are helpful for working on unsalted hashes. They
turn a large amount of computational work into a bit of computational work
and a bit of lookup work. The rainbow tables could easily be captured as
RDDs.

I guess I derailed my own discussion by focusing on password cracking,
since my intention was to explore how Spark applications are written for
compute-intensive workloads as opposed to data intensive ones. And for
certain types of password cracking, the best approach is to turn compute
work into data work. :)


On Thu, Jun 12, 2014 at 5:32 AM, Marek Wiewiorka <marek.wiewio...@gmail.com>
wrote:

> This actually what I've already mentioned -  with rainbow tables kept in
> memory it could be really fast!
>
> Marek
>
>
> 2014-06-12 9:25 GMT+02:00 Michael Cutler <mich...@tumra.com>:
>
> Hi Nick,
>>
>> The great thing about any *unsalted* hashes is you can precompute them
>> ahead of time, then it is just a lookup to find the password which matches
>> the hash in seconds -- always makes for a more exciting demo than "come
>> back in a few hours".
>>
>> It is a no-brainer to write a generator function to create all possible
>> passwords from a charset like "
>> abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789", hash
>> them and store them to lookup later.  It is however incredibly wasteful on
>> storage space.
>>
>> - all passwords from 1 to 9 letters long
>> - using the charset above = 13,759,005,997,841,642 passwords
>> - assuming 20 bytes to store the SHA-1 and up to 9 to store the password
>>  equals approximately 375.4 Petabytes
>>
>> Thankfully there is already a more efficient/compact mechanism to achieve
>> this using Rainbow Tables <http://en.wikipedia.org/wiki/Rainbow_table> --
>> better still, there is an active community of people who have already
>> precomputed many of these datasets already.  The above dataset is readily
>> available to download and is just 864GB -- much more feasible.
>>
>> All you need to do then is write a rainbow-table lookup function in Spark
>> and leverage the precomputed files stored in HDFS.  Done right you should
>> be able to achieve interactive (few second) lookups.
>>
>> Have fun!
>>
>> MC
>>
>>
>>  *Michael Cutler*
>> Founder, CTO
>>
>>
>> * Mobile: +44 789 990 7847 Email:   mich...@tumra.com <mich...@tumra.com>
>> Web:     tumra.com
>> <http://tumra.com/?utm_source=signature&utm_medium=email> *
>> *Visit us at our offices in Chiswick Park <http://goo.gl/maps/abBxq>*
>> *Registered in England & Wales, 07916412. VAT No. 130595328 <130595328>*
>>
>>
>> This email and any files transmitted with it are confidential and may
>> also be privileged. It is intended only for the person to whom it is
>> addressed. If you have received this email in error, please inform the
>> sender immediately. If you are not the intended recipient you must not
>> use, disclose, copy, print, distribute or rely on this email.
>>
>>
>> On 12 June 2014 01:24, Nick Chammas <nicholas.cham...@gmail.com> wrote:
>>
>>> Spark is obviously well-suited to crunching massive amounts of data. How
>>> about to crunch massive amounts of numbers?
>>>
>>> A few years ago I put together a little demo for some co-workers to
>>> demonstrate the dangers of using SHA1
>>> <http://codahale.com/how-to-safely-store-a-password/> to hash and store
>>> passwords. Part of the demo included a live brute-forcing of hashes to show
>>> how SHA1's speed made it unsuitable for hashing passwords.
>>>
>>> I think it would be cool to redo the demo, but utilize the power of a
>>> cluster managed by Spark to crunch through hashes even faster.
>>>
>>> But how would you do that with Spark (if at all)?
>>>
>>> I'm guessing you would create an RDD that somehow defined the search
>>> space you're going to go through, and then partition it to divide the work
>>> up equally amongst the cluster's cores. Does that sound right?
>>>
>>> I wonder if others have already used Spark for computationally-intensive
>>> workloads like this, as opposed to just data-intensive ones.
>>>
>>> Nick
>>>
>>>
>>> ------------------------------
>>> View this message in context: Using Spark to crack passwords
>>> <http://apache-spark-user-list.1001560.n3.nabble.com/Using-Spark-to-crack-passwords-tp7437.html>
>>> Sent from the Apache Spark User List mailing list archive
>>> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com.
>>>
>>
>>
>

Reply via email to