I think creating the samples in the search space within RDD will be
too expensive, and the amount of data will probably be larger than any
cluster.

However, you could create a RDD of searching ranges, and each range
will be searched by one map operation. As a result, in this design,
the # of row in RDD will be the same as the # of executors, and we can
use mapPartition to loop through all the sample in the range without
actually storing them in RDD.

Sincerely,

DB Tsai
-------------------------------------------------------
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai


On Wed, Jun 11, 2014 at 5:24 PM, Nick Chammas
<nicholas.cham...@gmail.com> wrote:
> Spark is obviously well-suited to crunching massive amounts of data. How
> about to crunch massive amounts of numbers?
>
> A few years ago I put together a little demo for some co-workers to
> demonstrate the dangers of using SHA1 to hash and store passwords. Part of
> the demo included a live brute-forcing of hashes to show how SHA1's speed
> made it unsuitable for hashing passwords.
>
> I think it would be cool to redo the demo, but utilize the power of a
> cluster managed by Spark to crunch through hashes even faster.
>
> But how would you do that with Spark (if at all)?
>
> I'm guessing you would create an RDD that somehow defined the search space
> you're going to go through, and then partition it to divide the work up
> equally amongst the cluster's cores. Does that sound right?
>
> I wonder if others have already used Spark for computationally-intensive
> workloads like this, as opposed to just data-intensive ones.
>
> Nick
>
>
> ________________________________
> View this message in context: Using Spark to crack passwords
> Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to