Re: One (large) field shared by many documents

2007-05-20 Thread Paul Elschot
On Sunday 20 May 2007 19:52, Peter Bloem wrote: > Thanks for your reply. This is getting me much deeper into the uncharted > territories of Lucene, especially the area of FieldCaches, but it's also > piqued my curiosity. Most of what I've been able to find are discussions > by people that are al

Re: One (large) field shared by many documents

2007-05-20 Thread Peter Bloem
My comments on storing document id's are perhaps based on a misguided view of lucene, but it's worth investigating. I figured since there's only one document per id in the document index, instead of executing one query with n OR clauses, you could execute n queries with a single docId to get al

Re: One (large) field shared by many documents

2007-05-20 Thread Peter Bloem
Thanks for your reply. This is getting me much deeper into the uncharted territories of Lucene, especially the area of FieldCaches, but it's also piqued my curiosity. Most of what I've been able to find are discussions by people that are already using FieldCache, rather than explanations of wha

Re: One (large) field shared by many documents

2007-05-20 Thread Erick Erickson
See Paul's e-mail, he's talking about a place I haven't been in Lucene yet. Other than that, see below On 5/19/07, Peter Bloem <[EMAIL PROTECTED]> wrote: Ah, now we're getting somewhere. So I run the first query on the collection index, get a set of collection id's from that. But how do I

Re: One (large) field shared by many documents

2007-05-20 Thread Paul Elschot
On Sunday 20 May 2007 02:49, Peter Bloem wrote: > Ah, now we're getting somewhere. So I run the first query on the > collection index, get a set of collection id's from that. But how do I > use them in the second query on the document index? It should be easy > enough to retrieve all documents i

Re: One (large) field shared by many documents

2007-05-19 Thread Peter Bloem
Ah, now we're getting somewhere. So I run the first query on the collection index, get a set of collection id's from that. But how do I use them in the second query on the document index? It should be easy enough to retrieve all documents in the returned collections (which is what I'm after), b

Re: One (large) field shared by many documents

2007-05-19 Thread Erick Erickson
You're right, your index will bloat considerably. In fact, I'm surprised it's only a factor of 5 The only thing that comes to mind is really a variant on your approach from your first e-mail. But I wouldn't use document ids because document IDs can change. So using doc IDs is...er fraught

Re: One (large) field shared by many documents

2007-05-19 Thread Peter Bloem
I'm sorry, I should have explained the intended behavior more clearly. The basic idea (without the collection fields) is that there are very simple documents in the index with one content field each. All I do with this index is a standard search in this text field. To improve the search result

Re: One (large) field shared by many documents

2007-05-19 Thread Erick Erickson
This seems kind of kludgy, but that may just mean I don't understand your problem very well. What is it that you're trying to accomplish? Searching constrained by topic or groups? If you're trying to search by groups, search the archive for the word "facet" or "faceted search". Otherwise, could

One (large) field shared by many documents

2007-05-19 Thread Peter Bloem
Hi, I have the following problem. I'm indexing documents that belong to some collection (ie. the dataset is divided into collections, which are divided into documents). These documents become my lucene documents, with some relatively small string that becomes the field I want to search. Howev