Broder or other near-duplicate algorithms?

Mark Kerzner Mon, 23 Mar 2009 21:23:56 -0700

Hi,

does anybody know of an open-source implementation of the Broder
algorithm<http://www.std.org/%7Emsm/common/clustering.html>in Hadoop?
Monika Henzinger reports
having done <http://ltaa.epfl.ch/monika/mpapers/nearduplicates2006.pdf> so
in MapReduce, and I wonder if somebody has repeated her work in open source?


I am going to do this if there is no implementation yet, and then I will ask
what I can do with the code.

Cheers,
Mark

Broder or other near-duplicate algorithms?

Reply via email to