So this is just the old problem of avoiding reading large, less
frequently accessed fields when you are trying to read just the smaller
more frequently accessed fields eg titles.
You can achieve this by:
a) Modifying Lucene using something like the code I originally posted
which stops reading
On Tue, 8 Mar 2005 18:10:26 + (GMT), mark harwood wrote:
"to be able" != "able to be"
> OK, I thought you wanted to count terms within the
> title field. If you want to group counts on the whole
> field value change the loop in my last post to this:
>
> for(int i=0;i {
> String fiel
>>> "to be able" != "able to be"
OK, I thought you wanted to count terms within the
title field. If you want to group counts on the whole
field value change the loop in my last post to this:
for(int i=0;ihttp://uk.messenger.yahoo.com
-
Hey Mark, thanks for the code sample. I did look into this, but for a book's
title field, for example,
"to be able" != "able to be"
and
"java programmer" != "programmer (java)" - tokenizer will remove the
parentheses
so in my use case at least, a field value isn't simply an array of its terms.
Your requirement was clear but I guess my suggested
solution wasn't.
Here it is in detail:
public class CountTest
{
public static void main(String[] args) throws
Exception
{
RAMDirectory tempDir = new RAMDirectory();
Analyzer analyzer=new WhitespaceAnalyze
Ah, I apologize. My use of the word "frequency" was misleading. By that, I
meant, the number of hits/documents, whose fields have that value. Once again:
doc a=title:1,keyword:a,contents:somelongmemoryhoggingstring
doc b=title:1,keyword:a,contents:somelongmemoryhoggingstring
doc c=title:1,keyword
The new TermFreqVector code sounds like what you need
here. This gives you fast access to precomputed totals
of term frequencies for each document.
See IndexReader.getTermFreqVector
Send instant messages to your online friends http://uk.messenger.yahoo.com
Neither. :-)
4) Top 10 fieldvalues (for some fields) returned in search results
So, let's say the results of a search were:
doc a=title:1,keyword:a,contents:somelongmemoryhoggingstring
doc b=title:1,keyword:a,contents:somelongmemoryhoggingstring
doc c=title:1,keyword:b,contents:somelongmemoryhog
Not sure I get what the requirement is yet:
>>Here's my requirement, ..I need to perform a simple
>>"Top 10 most frequent occurring " from a
search.
Does this mean:
1)Top 10 fieldnames present in each of your matching
documents?
2)Top 10 most frequent terms found in a choice of
field?
3)Top 10
Mark,
On Tue, 8 Mar 2005 09:56:37 + (GMT), mark harwood wrote:
>> But I suppose for Document
>> has to be further subclassed so that the other
>> non-initialized fields can be obtained as well, or
>>
> I don't think Document would be the right place for
> this - as a design pattern it is cast
10 matches
Mail list logo