SOMMERIA KLEIN Ariel Ext VIACCESS-BU_DRM wrote:
Hi all,
I'm a newbie with Lucene and I'm looking to implement the following:
I want to index posts from a forum, and, rather than proposing a search
on the contents, graphically represent the contents of the index. More
precisely, I would like to have a list of the most popular words, with a
number next to each indicating how often they occur. The icing on the cake would be to be able to click on such a word and get a subset of the posts including that word. Can Lucene be used for this? Has anyone already implemented it? Any
links?
I've dug around a bit without any success, but my apologies if this has
already been dealt with


See http://www.getopt.org/luke for an example of such functionality. However, I must disappoint you - the most frequent words in a corpus are quite probably also most useless words. For English these are: the, a, to, for, by, in, can, I, ... So, you will need to eliminate them from the top of the list to get any useful results.

--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to