Re: best strategy to deal with large index file

2005-12-16 Thread Dave Kor
On 12/17/05, Jeff Liang <[EMAIL PROTECTED]> wrote: > thanks for the reply. > I'm indexing emails. Fields are the common attribute on emails: > subject, content, attachment, message size, date, sender, recipients, > etc. The index is a few GB. Is there a good practice to keep the index > file siz

RE: best strategy to deal with large index file

2005-12-16 Thread Jeff Liang
thanks for the reply. I'm indexing emails. Fields are the common attribute on emails: subject, content, attachment, message size, date, sender, recipients, etc. The index is a few GB. Is there a good practice to keep the index file size at a certain level? when I do a search on the date field th

Re: best strategy to deal with large index file

2005-12-16 Thread Dan Funk
Are there specific queries that cause the out of memory problem? Or will any query do it? How large is the index? MultiSearcher allows you to search over multiple indexes, and is well supported throughout the API. How you split your indexes is depends on what you want to achieve. There are many

Re: all stop words in exact phrase get 0 hits

2005-12-16 Thread Yonik Seeley
Ah, sorry. Still, this doesn't seem like the most desirable behavior. It would be nice if we could fine a way to fix it. -Yonik On 12/16/05, javier muguruza <[EMAIL PROTECTED]> wrote: > Yonik, > > this was due to the additional parenthses I was using, see my last > email. I think I'll rewrite wi

best strategy to deal with large index file

2005-12-16 Thread Jeff Liang
Hi all, my index file is huge because of large set of data. when I do search, I get outofmemory exception sometime. I don't know what's usually causing the outofmemory exception. Is it during the search because of the index file is too big? or because there are too many hits? memory exception

Re: all stop words in exact phrase get 0 hits

2005-12-16 Thread javier muguruza
Yonik, this was due to the additional parenthses I was using, see my last email. I think I'll rewrite with the lucene api as Erik said. thanks, javi On 12/16/05, Yonik Seeley <[EMAIL PROTECTED]> wrote: > I can't reproduce this behavior with the current version of Lucene. > > +text:solar => 112

Re: all stop words in exact phrase get 0 hits

2005-12-16 Thread Yonik Seeley
I can't reproduce this behavior with the current version of Lucene. +text:solar => 112 docs +text:"a a a" => 0 docs because a is a stop word +text"solar" +text:"a a a" => 112 docs -Yonik On 12/15/05, javier muguruza <[EMAIL PROTECTED]> wrote: > Hi, > > Suppose I have a query like this: > +att

Re: How does Lucene compare to Dieselpoint?

2005-12-16 Thread Chris Hostetter
:We've been using Lucene here and like it, but we've been asked to look : into another engine also (Dieselpoint). Has anyone used both Dieselpoint and : Lucene. Any comments. We have a lot of documents (50 million+) each document : contains many small fields (maybe 100s). Important features we

Re: How to retrieve distinct field matches?

2005-12-16 Thread Michael D. Curtin
Plat wrote: Basically, pretend I do a regular search for "category:fiction". After stemming/etc, this would match any Document with a category of "fiction", "non-fiction", "fictitious", etc. All 900+ of them. BUT as far as the results are concerned, I'm not actually interested in each Document

Re: Directory implementation on JAR's

2005-12-16 Thread Erik Hatcher
If you need to transfer an index from one machine to another all you need to do is move the index directory (by zipping it if you want). There isn't a need to put an index in a JAR just for transfer purposes. There has been some work on a ZipDirectory (or JARDirectory?, not sure what it wa

Directory implementation on JAR's

2005-12-16 Thread Guenter Kukies
Hi, does anybody have an solution for writing/reading an index from a JAR-Archive like 'JARDirectory'. I need to create an Index on a box transfere the index to another and use it there. Günter

Re: lucene arrayIndexOutOfBoundesException

2005-12-16 Thread Erik Hatcher
Steve, Have you tried the trunk version of Lucene to ensure it fixes the issue you've encountered? That would be helpful information. Erik On Dec 16, 2005, at 3:56 AM, Steve Gaunt wrote: Hi, We have an index of around 100,000 documents. When we do a search for "The inte

lucene arrayIndexOutOfBoundesException

2005-12-16 Thread Steve Gaunt
Hi, We have an index of around 100,000 documents. When we do a search for "The integration of ERP into a logistics curriculum: applying a systems" We get an index out of bounds exception. There is a bug within bugzilla that indicates this problem. Bug number. 10052 Howev

Re: How to retrieve distinct field matches?

2005-12-16 Thread Erik Hatcher
This is pretty much the same problem that many of us have faced when it comes to faceted browsing. I'm using a set of cached BitSet's that represent the documents that have a specific category (or general "facet" in my case). I do a full-text search for "some query expression", using Quer