Re: Broder or other near-duplicate algorithms?

Yi-Kai Tsai Tue, 24 Mar 2009 11:20:09 -0700

hi Mark

we had done something on top of hadoop/hbase (mapreduce for evaluation ,hbase for online serving )

by reference http://www2007.org/papers/paper215.pdf

Hi,

does anybody know of an open-source implementation of the Broder
algorithm<http://www.std.org/%7Emsm/common/clustering.html>in Hadoop?
Monika Henzinger reports
having done <http://ltaa.epfl.ch/monika/mpapers/nearduplicates2006.pdf> so
in MapReduce, and I wonder if somebody has repeated her work in open source?

I am going to do this if there is no implementation yet, and then I will ask
what I can do with the code.

Cheers,
Mark



--
Yi-Kai Tsai (cuma) <[email protected]>, Asia Search Engineering.

Re: Broder or other near-duplicate algorithms?

Reply via email to