Re: One (large) field shared by many documents

2007-05-19 Thread Peter Bloem
Ah, now we're getting somewhere. So I run the first query on the collection index, get a set of collection id's from that. But how do I use them in the second query on the document index? It should be easy enough to retrieve all documents in the returned collections (which is what I'm after), b

Re: One (large) field shared by many documents

2007-05-19 Thread Erick Erickson
You're right, your index will bloat considerably. In fact, I'm surprised it's only a factor of 5 The only thing that comes to mind is really a variant on your approach from your first e-mail. But I wouldn't use document ids because document IDs can change. So using doc IDs is...er fraught

Re: One (large) field shared by many documents

2007-05-19 Thread Peter Bloem
I'm sorry, I should have explained the intended behavior more clearly. The basic idea (without the collection fields) is that there are very simple documents in the index with one content field each. All I do with this index is a standard search in this text field. To improve the search result

Re: What is the best way to split substring words

2007-05-19 Thread Erick Erickson
You probably should write a custom analyzer and/or filter that breaks your streams up into the custom tokens you want. Depending upon what you're really trying to accomplish, you may well need to use the same analyzer at BOTH index and search times. Best Erick On 5/19/07, bhecht <[EMAIL PROTECT

Re: One (large) field shared by many documents

2007-05-19 Thread Erick Erickson
This seems kind of kludgy, but that may just mean I don't understand your problem very well. What is it that you're trying to accomplish? Searching constrained by topic or groups? If you're trying to search by groups, search the archive for the word "facet" or "faceted search". Otherwise, could

What is the best way to split substring words

2007-05-19 Thread bhecht
Hi there, I want to be able to split tokens by giving a list of substring words. So I can give a list f subwords like: "strasse", "gasse", And the token "mainstrasse" or "maingasse" will be split to 2 tokens "main" and "strasse". Thanks -- View this message in context: http://www.nabble.com/

One (large) field shared by many documents

2007-05-19 Thread Peter Bloem
Hi, I have the following problem. I'm indexing documents that belong to some collection (ie. the dataset is divided into collections, which are divided into documents). These documents become my lucene documents, with some relatively small string that becomes the field I want to search. Howev