Re: Search for documents where field does not exist?

2005-06-17 Thread Chris Collins
I dont believe you can. What you can do is index a NULL term. That is, a term that will not occur naturally in your index. Then you can do such a search: field:NULL This requires you to index NULL when you know the field is going to be empty. C --- Dan Armbrust <[EMAIL PROTECTED]> wrote: >

Re: Performance with multi index

2005-06-16 Thread Chris Collins
I contest to the value of increasing the minMergeDocs.it directly effects how much IO gets performed in indexing. Splitting it into multiple indices (if you want to pay the price of complexity), may well increase your throughput. Assuming you are not utilizing all of the resources the sys

Re: Optimizing indexes with mulitiple processors?

2005-06-10 Thread Chris Collins
Yeh I think the bug is related to an array copy that expects 1k blocks (if I recall it was RAMDirectory or something like that). C --- Kevin Burton <[EMAIL PROTECTED]> wrote: > Chris Collins wrote: > > >Well I am currently looking at merging too. In my application mer

Re: Optimizing indexes with mulitiple processors?

2005-06-10 Thread Chris Collins
you blow up pretty quickly. regards C --- "Peter A. Friend" <[EMAIL PROTECTED]> wrote: > > On Jun 10, 2005, at 9:33 AM, Chris Collins wrote: > > > How many documents did you try to index? > > Only about 4000 at the moment. > > > I am using a rela

Re: Optimizing indexes with mulitiple processors?

2005-06-10 Thread Chris Collins
lt;[EMAIL PROTECTED]> wrote: > > On Jun 9, 2005, at 11:52 PM, Chris Collins wrote: > > > In that case I have a different performance issue, that is that > > FSInputStream > > and FSOutputStream inherit the buffer size of 1k from OS and IS > > This would be

Re: Optimizing indexes with mulitiple processors?

2005-06-10 Thread Chris Collins
ast (CPU Bound) and one where it is slow (IO Bound). I have to capitalize on the effects of both to get my job done and each of them have distinctive challenges. Regards Chris --- John Haxby <[EMAIL PROTECTED]> wrote: > Chris Collins wrote: > > >Ok that part isnt surprising.

Re: Optimizing indexes with mulitiple processors?

2005-06-10 Thread Chris Collins
Yes, that would line up with being pretty much cpu bound. So if you were to have 2 xeon's with HT then you kinda have almost 4 resources (threads) of execution you could take advantage of. So from my current tests where I have a multiple threads producing work for an index and one index writer (o

Re: Optimizing indexes with mulitiple processors?

2005-06-10 Thread Chris Collins
Kevin I would be curious to know more about your merging issues. As I mentioned I am concerned about merge time and in my case its against a filer that of course have high latency. The other issue is that I effectively index things with a primary key. I need to ensure an efficient way of prevent

Re: Optimizing indexes with mulitiple processors?

2005-06-09 Thread Chris Collins
increased the FSOutputStream and FSInputStream buffers and got it not to blow up on array copies I would love to know the short cut. Chris --- Kevin Burton <[EMAIL PROTECTED]> wrote: > Chris Collins wrote: > > >To follow up. I was surprised to find that from the experiment of i

Re: Optimizing indexes with mulitiple processors?

2005-06-09 Thread Chris Collins
case analyzer and this was a slightly hacked lucene-1.4.3 source code line that I swapped out some of the synchronized data structures (hashtable ->hashmap, Vector->ArrayList). <> --- Chris Collins <[EMAIL PROTECTED]> wrote: > I found with a fast RAID controller that I ca

Re: Optimizing indexes with mulitiple processors?

2005-06-09 Thread Chris Collins
I found with a fast RAID controller that I can easily be CPU bound, some of the io is related to latency. You can hide the latency by having overlapping IO (you get that with multiple indexers going on at the same time). I think there possibly could be more horsepower you can get out of the inver

Re: Optimizing indexes with mulitiple processors?

2005-06-09 Thread Chris Collins
You can segment your indexes into n physical parts (perhaps 4), then index those n parts concurrently. When you query you will use some kind of mulit searcher to span the parts. The one thing you may care about is that if you are going todo a recrawl / update of documents against the existing ind