You can have a huge dictionary of hashes in one RDD and use a map function
to generate a hash for the given password and lookup in your dictionary
RDD. Not sure about the performance though. Would be nice to see if you
design it.

Thanks
Best Regards


On Thu, Jun 12, 2014 at 7:23 AM, Nicholas Chammas <
nicholas.cham...@gmail.com> wrote:

> Yes, I mean the RDD would just have elements to define partitions or
> ranges within the search space, not have actual hashes. It's really just a
> using the RDD as a control structure, rather than a real data set.
>
> As you noted, we don't need to store any hashes. We just need to check
> them as they are computed against our target hash.
>
> 2014년 6월 11일 수요일, DB Tsai<dbt...@stanford.edu>님이 작성한 메시지:
>
> I think creating the samples in the search space within RDD will be
>> too expensive, and the amount of data will probably be larger than any
>> cluster.
>>
>> However, you could create a RDD of searching ranges, and each range
>> will be searched by one map operation. As a result, in this design,
>> the # of row in RDD will be the same as the # of executors, and we can
>> use mapPartition to loop through all the sample in the range without
>> actually storing them in RDD.
>>
>> Sincerely,
>>
>> DB Tsai
>> -------------------------------------------------------
>> My Blog: https://www.dbtsai.com
>> LinkedIn: https://www.linkedin.com/in/dbtsai
>>
>>
>> On Wed, Jun 11, 2014 at 5:24 PM, Nick Chammas
>> <nicholas.cham...@gmail.com> wrote:
>> > Spark is obviously well-suited to crunching massive amounts of data. How
>> > about to crunch massive amounts of numbers?
>> >
>> > A few years ago I put together a little demo for some co-workers to
>> > demonstrate the dangers of using SHA1 to hash and store passwords. Part
>> of
>> > the demo included a live brute-forcing of hashes to show how SHA1's
>> speed
>> > made it unsuitable for hashing passwords.
>> >
>> > I think it would be cool to redo the demo, but utilize the power of a
>> > cluster managed by Spark to crunch through hashes even faster.
>> >
>> > But how would you do that with Spark (if at all)?
>> >
>> > I'm guessing you would create an RDD that somehow defined the search
>> space
>> > you're going to go through, and then partition it to divide the work up
>> > equally amongst the cluster's cores. Does that sound right?
>> >
>> > I wonder if others have already used Spark for computationally-intensive
>> > workloads like this, as opposed to just data-intensive ones.
>> >
>> > Nick
>> >
>> >
>> > ________________________________
>> > View this message in context: Using Spark to crack passwords
>> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>

Reply via email to