You need a use case where a lot of computation is applied to a little data.
How about any of the various distributed computing projects out there?
Although the SETI@home use case seems like a cool example, I doubt you want
to reimplement its client.
It might be far simpler to reimplement a search
Indeed, rainbow tables are helpful for working on unsalted hashes. They
turn a large amount of computational work into a bit of computational work
and a bit of lookup work. The rainbow tables could easily be captured as
RDDs.
I guess I derailed my own discussion by focusing on password cracking,
s
This actually what I've already mentioned - with rainbow tables kept in
memory it could be really fast!
Marek
2014-06-12 9:25 GMT+02:00 Michael Cutler :
> Hi Nick,
>
> The great thing about any *unsalted* hashes is you can precompute them
> ahead of time, then it is just a lookup to find the p
Hi Nick,
The great thing about any *unsalted* hashes is you can precompute them
ahead of time, then it is just a lookup to find the password which matches
the hash in seconds -- always makes for a more exciting demo than "come
back in a few hours".
It is a no-brainer to write a generator function
You can have a huge dictionary of hashes in one RDD and use a map function
to generate a hash for the given password and lookup in your dictionary
RDD. Not sure about the performance though. Would be nice to see if you
design it.
Thanks
Best Regards
On Thu, Jun 12, 2014 at 7:23 AM, Nicholas Cham
Yes, I mean the RDD would just have elements to define partitions or
ranges within the search space, not have actual hashes. It's really just a
using the RDD as a control structure, rather than a real data set.
As you noted, we don't need to store any hashes. We just need to check them
as they are
What about rainbow tables?
http://en.wikipedia.org/wiki/Rainbow_table
M.
2014-06-12 2:41 GMT+02:00 DB Tsai :
> I think creating the samples in the search space within RDD will be
> too expensive, and the amount of data will probably be larger than any
> cluster.
>
> However, you could create a
I think creating the samples in the search space within RDD will be
too expensive, and the amount of data will probably be larger than any
cluster.
However, you could create a RDD of searching ranges, and each range
will be searched by one map operation. As a result, in this design,
the # of row i