Hi, does anybody know of an open-source implementation of the Broder algorithm<http://www.std.org/%7Emsm/common/clustering.html>in Hadoop? Monika Henzinger reports having done <http://ltaa.epfl.ch/monika/mpapers/nearduplicates2006.pdf> so in MapReduce, and I wonder if somebody has repeated her work in open source?
I am going to do this if there is no implementation yet, and then I will ask what I can do with the code. Cheers, Mark