2.9 per segment searching/caching

2009-10-21 Thread Bill Au
Since Lucene 2.9 has per segment searching/caching, does query performance degrade less than before (2.9) as more segments are added to the index? Bill

support for PayloadTermQuery in MoreLikeThis

2009-09-09 Thread Bill Au
Has anyone done anything regarding the support of PayloadTermQuery in MoreLikeThis? I took a quick look at the code and it seems to be simply a matter of swapping TermQuery with PayloadTermQuery. I guess a generic solution would be to add a enable method to enable PayloadTermQuery, keeping TermQu

Re: Replicating Lucene Index with out SOLR

2008-08-28 Thread Bill Au
The snapinstaller script invokes the commit command to trigger Solr to do a commit, which open a new index reader and then auto-warm the caches. You will need to replace that with your own code to do the same for your Lucene index. On Thu, Aug 28, 2008 at 1:47 AM, rahul_k123 <[EMAIL PROTECTED]> w

Re: Replicating Lucene Index with out SOLR

2008-08-28 Thread Bill Au
Solr uses Doug's rsync method to do replication. The scripts are pretty much standalone and does not require Solr. It should work on any Lucene index. Bill On Wed, Aug 27, 2008 at 11:52 PM, Kent Fitch <[EMAIL PROTECTED]> wrote: > Check out this recipe for using rsync by Doug Cutting: > http://

Re: Indexing source code files

2008-02-29 Thread Bill Au
There is an opensource project, OpenGrok, that uses Lucene for indexing and searching source code: http://opensolaris.org/os/project/opengrok/ It has Analyzers for different type of source files. It does link source code to requirements but you can take a look at the source code to see how it do

Re: open file descriptors for deleted index files

2007-09-04 Thread Bill Au
Closing old IndexSearcher should take care of this problem for you. Take a look at Solr. It opens a new IndexSearcher and direct all requests to the new one. It then closes the old IndexSearcher when all the requests that it is serving has completed. Bill On 9/4/07, Tony Qian <[EMAIL PROTECTED]

Re: Memory leak (JVM 1.6 only)

2007-05-18 Thread Bill Au
I actually had to deal with a leak in non-heap native memory once. I am running on Linux so I just use good old "ps" to monitor native memory usage. Bill On 5/18/07, Stephen Gray <[EMAIL PROTECTED]> wrote: Thanks. If the extra memory allocated is native memory I don't think jconsole includes

Re: QueryParser, PrefixQuery, and case sensitivity

2007-05-06 Thread Bill Au
say, 8G huge), I'd index everything in, say, lower case. And ditto for your query parsing. If you need to return data to the user in mixed case, then you can *store* (but perhaps not *index*) the display fields. So you search on one field and return data from another. Best Erick On 5/4/07

QueryParser, PrefixQuery, and case sensitivity

2007-05-04 Thread Bill Au
I have an index with both fields that are case sensitive and insensitive. I am trying to use a QueryParser to accept query from end users for searching. The default behavior of QueryParser is to lowercase the prefix text to create the PrefixQuery. So wildcard search on the case sensitive fields

Re: Optimizing indexes with mulitiple processors?

2005-06-10 Thread Bill Au
That's not true in my case. The CPU never went over 50%. I/O wait is often greater the CPU and can be as high as 90%. Bill On 6/10/05, Kevin Burton <[EMAIL PROTECTED]> wrote: > Bill Au wrote: > > >Optimize is disk I/O bound. So I am not sure what multipl

Re: Optimizing indexes with mulitiple processors?

2005-06-09 Thread Bill Au
Optimize is disk I/O bound. So I am not sure what multiple CPUs will buy you. Bill On 6/9/05, Kevin Burton <[EMAIL PROTECTED]> wrote: > Is it possible to get Lucene to do an index optimize on multiple > processors? > > Its a single threaded algorithm currently right? > > Its a shame since I ha

Re: scalability w/ number of fields

2005-04-05 Thread Bill Au
The compound index structure is meant for indexes with a large number of fields. I was watching the files in the index directory of my compound index while it was being optimized. The IndexWriter that I used was set to use compound file. It looks to me that Lucene first combined all existing segme