Re: How about lucene's delete performance ?

2010-10-13 Thread Dan OConnor
Jeff, I would suggest not deleting documents off the back of the index unless you can optimize your index regularly. (Depending on your volume, this could be every day or once a week) I would suggest having two indexes, one that is "this" week and one that is "last" week and a multi-index searc

RE: Scaling Lucene to 1bln docs

2010-08-10 Thread Dan OConnor
Shelly: You wouldn't necessarily have to use a multisearcher. A suggested alternative is: - shard into 10 indices. If you need the concept of a date range search, I would assign the documents to the shard by date, otherwise random assignment is fine. - have a pool of IndexSearchers for each in

Re: Storing a Lucene Index on a SAN Storage: good idea?

2009-09-26 Thread Dan OConnor
We have two instances of a search system containing 40 million documents - identical jvm versions, lucene jars, and our code. One is running on local disks. The other is on a SAN. The instance on local disks consistently far outperforms the SAN instance. And I'd second Uwe's sentiments. An in

RE: Indexing large files? - No answers yet...

2009-09-11 Thread Dan OConnor
Paul: My first suggestion would be to update your JVM to the latest version (or at least .14). There were several garbage collection related issues resolved in version 10 - 13 (especially dealing with large heaps). Next, your IndexWriter parameters would help figure out why you are using so mu

RE: indexing 100GB of data

2009-07-22 Thread Dan OConnor
Hi Jamie, I would appreciate if you could provide details on the hardware/OS you are running this system on and what kind of search response time you are getting. As well as how you add email data to your index. Thanks, Dan -Original Message- From: Jamie [mailto:ja...@stimulussoft.com

Re: is there a way to control when merges happen?

2009-05-15 Thread Dan OConnor
09 at 1:41 PM, Dan OConnor wrote: > All: > > I would like to be able to control when an index merge happens (by wall > clock time) so that merges do not occur in the middle of the business day. > > I have a lucene system based on v2.3.2 and we add a couple hundred thousand > d

Re: is there a way to control when merges happen?

2009-05-15 Thread Dan OConnor
e On Fri, May 15, 2009 at 4:41 PM, Dan OConnor wrote: > All: > > I would like to be able to control when an index merge happens (by wall clock > time) so that merges do not occur in the middle of the business day. > > I have a lucene system based on v2.3.2 and we add a

RE: Help to determine why an optimized index is proportionaly too big.

2009-04-09 Thread Dan OConnor
Thanks for the feed back Chris. Can you (or someone else on the list) tell me about the IndexMerge tool? Thanks Dan -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: Thursday, April 09, 2009 6:46 PM To: java-user@lucene.apache.org Subject: Re: Help to det

RE: simultaneous indexing and searching causing intermitently long searches.

2009-04-04 Thread Dan OConnor
riginal Message- From: Michael McCandless [mailto:luc...@mikemccandless.com] Sent: Saturday, April 04, 2009 6:38 AM To: java-user@lucene.apache.org Subject: Re: simultaneous indexing and searching causing intermitently long searches. On Fri, Apr 3, 2009 at 10:21 PM, Dan OConnor wrote: >

simultaneous indexing and searching causing intermitently long searches.

2009-04-03 Thread Dan OConnor
All, I have a several questions regarding query response time and I would appreciate any help that can be provided. We have a system that indexes approximately 200,000 documents per day at a fairly constant rate and holds them in a cfs-style file system directory index for 8 days. The index is

Help to determine why an optimized index is proportionaly too big.

2009-04-01 Thread Dan OConnor
All: We are using java lucene 2.3.2 to index a fairly large number of documents (roughly 400,000 per day). We have divided the time history into various depths. Our first stage covers 8 days and our next stage covers 22. The index directory for the first stage is approximately 20G when fully op