Automatically extracted SpamAssassin FAQs

Stefan Henß Tue, 22 Feb 2011 21:35:10 -0800

Hi everybody,

I'm currently doing research for my bachelor thesis on how toautomatically extract FAQs from unstructured data.


For this I've built a system automatically performing the following:

- Load thousands of conversations from forums and mailing lists (don'tmind the categories there).- Build categorization solely based on the conversation's texts (byclustering).

- Pick the best modelled categories as basis for one FAQ each.

- For each question (first entry in a conversation) find the best replyfrom its answers.- Select the most relevant and well formatted question/answer-pairs foreach FAQ.

For the evaluation part I'd like to ask you for having a look at one ortwo FAQs and maybe give some comments on how far the questions matchedthe FAQ's title, how relevant they were etc.

Here's the direct link to the SpamAssassin FAQs:http://faqcluster.com/spam-spamassassin-mail-rule-rules

And here a quite good example in my opinion:http://faqcluster.com/question-2015861564


(There are some other interesting FAQs as well at http://faqcluster.com/)


Thanks for your help

Stefan

Automatically extracted SpamAssassin FAQs

Reply via email to