Spark powered wikipedia analysis and exploration

Guillaume Pitel Thu, 27 Mar 2014 08:12:46 -0700

Hi Spark users,

I don't know if it's the right place to announce it, but Spark has a new visible use case through a demo we put online here :

http://wikinsights.org

It allows you to explore the English Wikipedia with a few added benefits from our proprietary semantic and relations analysis method, so that you can see similar pages (based on text content or links), see the most relevant words for a page, and other stuff.

Spark is used for the processing of the English Wikipedia, and for the computation. It takes about 30 minutes for three iterations of our method on the whole 4.4M documents * 2.1M words matrix, on a smallish cluster of 7 nodes with 4 core, 32GB RAM.

Any feedback is welcome (except on the aesthetic aspect, we already know the UI is really bad)

Enjoy exploring Wikipedia in your spare time :)

Guillaume

Guillaume PITEL, Président
+33(0)6 25 48 86 80

eXenSa S.A.S.
41, rue Périer - 92120 Montrouge - FRANCE
Tel +33(0)1 84 16 36 77 / Fax +33(0)9 72 28 37 05

Spark powered wikipedia analysis and exploration

Reply via email to