SOMMERIA KLEIN Ariel Ext VIACCESS-BU_DRM wrote:
Hi all,
I'm a newbie with Lucene and I'm looking to implement the following:
I want to index posts from a forum, and, rather than proposing a search
on the contents, graphically represent the contents of the index. More
precisely, I would like to have a list of the most popular words, with a
number next to each indicating how often they occur.
The icing on the cake would be to be able to click on such a word and
get a subset of the posts including that word.
Can Lucene be used for this? Has anyone already implemented it? Any
links?
I've dug around a bit without any success, but my apologies if this has
already been dealt with
See http://www.getopt.org/luke for an example of such functionality.
However, I must disappoint you - the most frequent words in a corpus are
quite probably also most useless words. For English these are: the, a,
to, for, by, in, can, I, ...
So, you will need to eliminate them from the top of the list to get any
useful results.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]