Re: Document lazy-loading WAS [Re: Fast access to a random page of the search results.]

2005-03-08 Thread markharw00d
So this is just the old problem of avoiding reading large, less frequently accessed fields when you are trying to read just the smaller more frequently accessed fields eg titles. You can achieve this by: a) Modifying Lucene using something like the code I originally posted which stops reading

Re: Document lazy-loading WAS [Re: Fast access to a random page of the search results.]

2005-03-08 Thread Kelvin Tan
On Tue, 8 Mar 2005 18:10:26 + (GMT), mark harwood wrote:  "to be able" != "able to be" > OK, I thought you wanted to count terms within the > title field. If you want to group counts on the whole > field value change the loop in my last post to this: > > for(int i=0;i { > String fiel

Re: Document lazy-loading WAS [Re: Fast access to a random page of the search results.]

2005-03-08 Thread mark harwood
>>> "to be able" != "able to be" OK, I thought you wanted to count terms within the title field. If you want to group counts on the whole field value change the loop in my last post to this: for(int i=0;ihttp://uk.messenger.yahoo.com -

Re: Document lazy-loading WAS [Re: Fast access to a random page of the search results.]

2005-03-08 Thread Kelvin Tan
Hey Mark, thanks for the code sample. I did look into this, but for a book's title field, for example, "to be able" != "able to be" and "java programmer" != "programmer (java)" - tokenizer will remove the parentheses so in my use case at least, a field value isn't simply an array of its terms.

Re: Document lazy-loading WAS [Re: Fast access to a random page of the search results.]

2005-03-08 Thread mark harwood
Your requirement was clear but I guess my suggested solution wasn't. Here it is in detail: public class CountTest { public static void main(String[] args) throws Exception { RAMDirectory tempDir = new RAMDirectory(); Analyzer analyzer=new WhitespaceAnalyze

Re: Document lazy-loading WAS [Re: Fast access to a random page of the search results.]

2005-03-08 Thread Kelvin Tan
Ah, I apologize. My use of the word "frequency" was misleading. By that, I meant, the number of hits/documents, whose fields have that value. Once again: doc a=title:1,keyword:a,contents:somelongmemoryhoggingstring doc b=title:1,keyword:a,contents:somelongmemoryhoggingstring doc c=title:1,keyword

Re: Document lazy-loading WAS [Re: Fast access to a random page of the search results.]

2005-03-08 Thread mark harwood
The new TermFreqVector code sounds like what you need here. This gives you fast access to precomputed totals of term frequencies for each document. See IndexReader.getTermFreqVector Send instant messages to your online friends http://uk.messenger.yahoo.com

Re: Document lazy-loading WAS [Re: Fast access to a random page of the search results.]

2005-03-08 Thread Kelvin Tan
Neither. :-) 4) Top 10 fieldvalues (for some fields) returned in search results So, let's say the results of a search were: doc a=title:1,keyword:a,contents:somelongmemoryhoggingstring doc b=title:1,keyword:a,contents:somelongmemoryhoggingstring doc c=title:1,keyword:b,contents:somelongmemoryhog

Re: Document lazy-loading WAS [Re: Fast access to a random page of the search results.]

2005-03-08 Thread mark harwood
Not sure I get what the requirement is yet: >>Here's my requirement, ..I need to perform a simple >>"Top 10 most frequent occurring " from a search. Does this mean: 1)Top 10 fieldnames present in each of your matching documents? 2)Top 10 most frequent terms found in a choice of field? 3)Top 10

Document lazy-loading WAS [Re: Fast access to a random page of the search results.]

2005-03-08 Thread Kelvin Tan
Mark, On Tue, 8 Mar 2005 09:56:37 + (GMT), mark harwood wrote: >> But I suppose for Document >> has to be further subclassed so that the other >> non-initialized fields can be obtained as well, or >> > I don't think Document would be the right place for > this - as a design pattern it is cast

Re: Fast access to a random page of the search results.

2005-03-08 Thread mark harwood
> But I suppose for Document > has to be further subclassed so that the other > non-initialized fields can be obtained as well, or I don't think Document would be the right place for this - as a design pattern it is cast as a "value object" or "transfer object" which is passed to (potentially remo

Re: Fast access to a random page of the search results.

2005-03-07 Thread Kelvin Tan
Hi Mark, partially, yes. But I suppose for Document has to be further subclassed so that the other non-initialized fields can be obtained as well, or perhaps an additional method to init the remaining fields from a partially initialized Doc? Thanks for responding.. k On Mon, 07 Mar 2005 21:00:

Re: Fast access to a random page of the search results.

2005-03-07 Thread markharw00d
Did you mean this? http://marc.theaimsgroup.com/?l=lucene-user&m=108525376821114&w=2 Kelvin Tan wrote: This is a bump post... I'm wondering if there's any code (contributed, bugzilla, core or otherwise) that provides document lazy-loading functionality, i.e. only eager-initialize specific fields

Re: Fast access to a random page of the search results.

2005-03-07 Thread Kelvin Tan
This is a bump post... I'm wondering if there's any code (contributed, bugzilla, core or otherwise) that provides document lazy-loading functionality, i.e. only eager-initialize specific fields, or load fields on-demand. Thanks, k On Thu, 3 Mar 2005 13:55:00 +0100, Kelvin Tan wrote: > Is this

Re: Fast access to a random page of the search results.

2005-03-03 Thread Kelvin Tan
Is this actually in the codebase? I couldn't find it in SVN or in Bugzilla... kelvin On Mon, 28 Feb 2005 11:59:54 -0500, Erik Hatcher wrote: > Or perhaps you > need to investigate the (is it in the codebase already?) patch to > load fields lazily upon demand instead. -

Re: Fast access to a random page of the search results.

2005-03-02 Thread Stanislav Jordanov
Sent: Wednesday, March 02, 2005 12:04 AM Subject: Re: Fast access to a random page of the search results. > Daniel Naber wrote: > > After fixing this I can reproduce the problem with a local index that > > contains about 220.000 documents (700MB). Fetching the first document > &

Re: Fast access to a random page of the search results.

2005-03-02 Thread Stanislav Jordanov
oug Cutting" <[EMAIL PROTECTED]> To: "Lucene Users List" Sent: Tuesday, March 01, 2005 8:15 PM Subject: Re: Fast access to a random page of the search results. > Stanislav Jordanov wrote: > > startTs = System.currentTimeMillis(); >