Re: Query performance in Lucene 4.x

2013-10-01 Thread Desidero
Uwe, I was using a bounded thread pool. I don't know if the problem was the task overload or something about the actual efficiency of searching a single segment rather than iterating over multiple AtomicReaderContexts, but I'd lean toward task overload. I will do some testing tonight to find out

RE: Query performance in Lucene 4.x

2013-10-01 Thread Uwe Schindler
Hi, use a bounded thread pool. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Desidero [mailto:desid...@gmail.com] > Sent: Tuesday, October 01, 2013 11:37 PM > To: java-user@lucene.apache.org > Su

Re: Query performance in Lucene 4.x

2013-10-01 Thread Desidero
For anyone who was wondering, this was actually resolved in a different thread today. I misread the information in the IndexSearcher(IndexReader,ExecutorService) constructor documentation - I was under the impression that it was submitting a thread for each index shard (MultiReader wraps 20 shards,

Re: How to make good use of the multithreaded IndexSearcher?

2013-10-01 Thread Benson Margulies
On Tue, Oct 1, 2013 at 3:58 PM, Desidero wrote: > Benson, > > Rather than forcing a random number of small segments into the index using > maxMergedSegmentMB, it might be better to split your index into multiple > shards. You can create a specific number of balanced shards to control the > paralle

Re: How to make good use of the multithreaded IndexSearcher?

2013-10-01 Thread Desidero
Benson, Rather than forcing a random number of small segments into the index using maxMergedSegmentMB, it might be better to split your index into multiple shards. You can create a specific number of balanced shards to control the parallelism and then forceMerge each shard down to 1 segment to avo

Re: Rendexing problem: Indexing folder size is keep on growing for same remote folder

2013-10-01 Thread gudiseashok
I am really sorry if something made you confuse, as I said I am indexing a folder which contains mylogs.log,mylogs1.log,mylogs2.log etc, I am not indexing them as a flat file. I have tokenized my each line of text with regex and storing them as fields like "messageType", "timeStamp","message". So

Re: Rendexing problem: Indexing folder size is keep on growing for same remote folder

2013-10-01 Thread Ian Lea
I'm still a bit confused about exactly what you're indexing, when, but if you have a unique id and don't want to add or update a doc that's already present, add the unique id to the index and search (TermQuery probably) for each one and skip if already present. Can't you change the log rotation/co

Re: Rendexing problem: Indexing folder size is keep on growing for same remote folder

2013-10-01 Thread gudiseashok
Hi Basically my log folder consists of four log files like abc.log,abc1.log,abc2.log,abc3.log, as my log appender is doing. Every 30 minutes content will be changed of all these file , for example after 30 minutes refresh my conent of abc1.log will be replaced with existing abc.log content and ab

Re: Rendexing problem: Indexing folder size is keep on growing for same remote folder

2013-10-01 Thread Ian Lea
milliseconds as unique keys are a bad idea unless you are 100% certain you'll never be creating 2 docs in the same millisecond. And are you saying the log record A1 from file a.log indexed at 14:00 will have the same unique id as the same record from the same file indexed at 14:30 or will it be di

RE: Rendexing problem: Indexing folder size is keep on growing for same remote folder

2013-10-01 Thread gudiseashok
I am afraid, my document in the above code has already a unique-key (will milli-seconds I hope this is enough to differentiate with another records). My requirement is simple, I have a folder with a.log,b.log and c.log files which will be updated every 30 minutes, I want to update the index of the

Re: How to make good use of the multithreaded IndexSearcher?

2013-10-01 Thread Michael McCandless
You might want to set a smallish maxMergedSegmentMB in TieredMergePolicy to "force" enough segments in the index ... sort of the opposite of optimizing. Really, IndexSearcher's approach to using one thread per segment is rather silly, and, it's annoying/bad to expose change in behavior due to segm

Re: Multi server

2013-10-01 Thread Michael McCandless
Maybe Lucene's new replication module is useful for this? Mike McCandless http://blog.mikemccandless.com On Mon, Sep 30, 2013 at 3:08 PM, Neda Grbic wrote: > Hi all > > I'm hoping to use Lucene in my project, but I have two master-master > servers. Is there any good tutorial how to make Lucene

Re: How to make good use of the multithreaded IndexSearcher?

2013-10-01 Thread Adrien Grand
Hi Benson, On Mon, Sep 30, 2013 at 5:21 PM, Benson Margulies wrote: > The multithreaded index searcher fans out across segments. How aggressively > does 'optimize' reduce the number of segments? If the segment count goes > way down, is there some other way to exploit multiple cores? forceMerge[1

Re: Multi server

2013-10-01 Thread Ian Lea
I'm not aware of a lucene rather than Solr or whatever tutorial. A search for something like "lucene sharding" will get hits. Why don't you want to use Solr or Katta or similar? They've already done much of the hard work. How much data are you talking about? What are your master-master require

RE: Rendexing problem: Indexing folder size is keep on growing for same remote folder

2013-10-01 Thread Uwe Schindler
You have to call updateDocument with the unique key of the document to update. The unique key must be a separate, indexed, not necessarily stored key. addDocument just adds a new instance of the document to the index, it cannot determine if it’s a duplicate. - Uwe Schindler H.-H.-Meier-Alle