I think creating the samples in the search space within RDD will be too expensive, and the amount of data will probably be larger than any cluster.
However, you could create a RDD of searching ranges, and each range will be searched by one map operation. As a result, in this design, the # of row in RDD will be the same as the # of executors, and we can use mapPartition to loop through all the sample in the range without actually storing them in RDD. Sincerely, DB Tsai ------------------------------------------------------- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Wed, Jun 11, 2014 at 5:24 PM, Nick Chammas <nicholas.cham...@gmail.com> wrote: > Spark is obviously well-suited to crunching massive amounts of data. How > about to crunch massive amounts of numbers? > > A few years ago I put together a little demo for some co-workers to > demonstrate the dangers of using SHA1 to hash and store passwords. Part of > the demo included a live brute-forcing of hashes to show how SHA1's speed > made it unsuitable for hashing passwords. > > I think it would be cool to redo the demo, but utilize the power of a > cluster managed by Spark to crunch through hashes even faster. > > But how would you do that with Spark (if at all)? > > I'm guessing you would create an RDD that somehow defined the search space > you're going to go through, and then partition it to divide the work up > equally amongst the cluster's cores. Does that sound right? > > I wonder if others have already used Spark for computationally-intensive > workloads like this, as opposed to just data-intensive ones. > > Nick > > > ________________________________ > View this message in context: Using Spark to crack passwords > Sent from the Apache Spark User List mailing list archive at Nabble.com.