They may not be dictionary, but they is a limited number of term entries and
they seem regular. Your inquiries indicate you need a faceting feature (or
even an sql-like set of queries backed up by a fast index...), probably with
some pruning.
Clustering is an unsupervised process that attempts to
Thanks Dawid. I was trying to give some example, but this is not
exactly our text. Our fields include things like "user name", "IP
Address", "Application Name", "Port 3", "Byte Count" - all network
related stuff. So, if user searches on certain IP address then we
would need to group the result by u
> 1) We index around 20 fields, of that we want to have grouping option
> for five of them. For ex., user can search on name of the city and we
> should have option to group by products available in that city (and
> vice-versa).
>
Are these fields stricly defined or free text? Because if they are
Thanks Dawid for the reply. Here is what we are trying to do,
1) We index around 20 fields, of that we want to have grouping option
for five of them. For ex., user can search on name of the city and we
should have option to group by products available in that city (and
vice-versa).
2) We also need
Can you shed some more light on what you're trying to achieve (what is
the purpose of clustering -- are clusters to be utilized for front-end
user interface, further data mining analysis, etc.)?
With the sizes you report Carrot2 won't work for you, I'm afraid, but
Mahout may. Still, there's plenty
Hi Joe,
I'm one of Carrot2 developers and I have good news for you :) The example of
using Carrot2 with Lucene is in the Carrot2 repository on SourceForge.net (
http://sourceforge.net/projects/carrot2). Please check out the "carrot2"
module (http://cvs.sourceforge.net/viewcvs.py/carrot2/carrot2/)