How to rebuild index

2011-01-21 Thread 黄靖宇
Hi, I am new to lucene. Recently I was assigned for some lucene related workitems. Now there is one problem. Before, we use StandardAnalyzer in our application, and our application has been online for about two years. Now, we must to write a Custom Analyzer to replace the StandarAnalyzer for enhanc

Re: How to rebuild index

2011-01-21 Thread Lahiru Samarakoon
HI You were using a system for two years and it used an index created using lucene with the StandardAnalyzer. So, There must be an index creation code with your system. Anyway,Since you have the book “*Lucene in action*” you can find how to create an index by reading chapter 2 (Indexing). Please

RE: How to rebuild index

2011-01-21 Thread Uwe Schindler
Hi, > "If you’re changing analyzers, you should rebuild your index using the new analyzer so that all documents are analyzed in the same manner." It says everything: Take your original data and re-create the index. Indexing is a lossy operation, so you must recreate the index using *all* the orig

Re: How to rebuild index

2011-01-21 Thread 黄靖宇
Hmm, I see. Thanks very much. 2011/1/21 Uwe Schindler > Hi, > > > "If you’re changing analyzers, you should rebuild your index using the > new analyzer so that all documents are analyzed in the same manner." > > It says everything: Take your original data and re-create the index. > Indexing is a

Re: Paging with Lucene

2011-01-21 Thread Ian Lea
The standard recommendation for paging is to re-execute the search for second and subsequent pages and return the second or subsequent chunk of hits. Would that not work in your case? An alternative is to read and cache hits from the initial search but that is generally more complex. -- Ian. O

Re: How to rebuild index

2011-01-21 Thread Erdinc Akkaya
First of all try it on different folder than your current index folder. new analyzer will make different index but same data. First you should create index on different folder than just replace your new index with current index files. If it fits, then replace the code and it will work. 2011/1/21 黄

gracefully interrupting an optimize

2011-01-21 Thread v . sevel
Hi, Each night I optimize an index that contains 35 millions docs. Its takes about 1.5 hours. For maintenance reasons, it may happen that the machine gets rebooted. In that case, server gets a chance to gracefully shutdown, but eventually, the reboot script will kill the processes that did not

AW: Paging with Lucene

2011-01-21 Thread Clemens Wyss
The problem is, that due to the "filtering" AFTER having searched the index, we don't know how many TopDocs to read in order have "enough" for page x. Does lucene's search allow injecting kind of a "voter"/"vetoer", which is called for any hit (ScoreDoc) lucene has encountered. This voter should

Re: How to rebuild index

2011-01-21 Thread Jack Krupansky
The best thing is to re-index from your original source data, but if that is not available, you can also re-index stored fields, assuming that you created the index using stored fields for text fields. You would have to write custom code to retrieve the stored values (not the actual terms since

Re: Paging with Lucene

2011-01-21 Thread Ian Lea
> The problem is, that due to the "filtering" AFTER having searched the index, > we don't know how many TopDocs to read in order have "enough" for page x. Think of a number and double it? Unless the number get really high lucene is generally plenty fast enough. Or read n and if, after filtering

RE: Paging with Lucene

2011-01-21 Thread Uwe Schindler
You can write a custom Collector that does this (just not delegating the collect(int) call) and wrap TopDocsCollector with that. Alternative: Plug in a Filter that filters your documents during the query. As doing this on iterating hits is often costly, the ideal solution would be to create a cach

Re: gracefully interrupting an optimize

2011-01-21 Thread Michael McCandless
If you call optimize(false), that'll return immediately but run the optimize "in the background" (assuming you are using the default ConcurrentMergeScheduler). Later, when it's time to stop optimizing, call IW.close(false), which will abort any running merges yet keep any merges that had finished

Re: gracefully interrupting an optimize

2011-01-21 Thread Paul Libbrecht
Would that happen "automagically" at finalization? paul Le 21 janv. 2011 à 15:13, Michael McCandless a écrit : > If you call optimize(false), that'll return immediately but run the > optimize "in the background" (assuming you are using the default > ConcurrentMergeScheduler). > > Later, when i

Re: gracefully interrupting an optimize

2011-01-21 Thread Michael McCandless
No. If you just do IW.close() <-- no boolean specified, then that defaults to IW.close(true) which means "wait for all BG merges to finish". So "normally" IW.close() reserves the right to take a long time. But IW.close(false) should finish relatively quickly... Mike On Fri, Jan 21, 2011 at 9:2

Performing a query on token length

2011-01-21 Thread Camden Daily
Hello all, Does anyone know if it is possible in Lucene to do a query based on the string length of the value of a field? For example, if I wanted all index matches where a specific field like 'first_name' was between 10 and 20 characters. Thanks! -Camden Daily

Re: Performing a query on token length

2011-01-21 Thread Ian Lea
Not directly, but you could index a NumericField called "length" and do a NumericRangeQuery on it. Or loop through all the terms checking length. But that isn't a query and will be slow. -- Ian. On Fri, Jan 21, 2011 at 3:15 PM, Camden Daily wrote: > Hello all, > > Does anyone know if it is p

Re: Performing a query on token length

2011-01-21 Thread Jack Krupansky
A wildcard query with 10 leading question marks, each of which requires a single character. This would also depend on leading wildcards being enabled in your query parser (if you are using one.) first_name:??* The performance would not necessarily be great, but functionally it would do

Re: Performing a query on token length

2011-01-21 Thread Jack Krupansky
Oops... I only solved half the problem, the other half was to limit length to 20, which would be done with a negated leading wildcard of 21 question marks: first_name:??* -first_name:?* -- Jack Krupansky -Original Message- From: Jack Krupansky Sent: Frida

Re: Performing a query on token length

2011-01-21 Thread Ian Lea
Wouldn't that also match names with length > 20? -- Ian. On Fri, Jan 21, 2011 at 3:26 PM, Jack Krupansky wrote: > A wildcard query with 10 leading question marks, each of which requires a > single character. This would also depend on leading wildcards being enabled > in your query parser (if y

Re: Performing a query on token length

2011-01-21 Thread Camden Daily
Thank you Ian and Jack, I believe I'll go with simply creating a NumericField for the length, as that will result in the best performance. -Camden On Fri, Jan 21, 2011 at 10:35 AM, Ian Lea wrote: > Wouldn't that also match names with length > 20? > > > -- > Ian. > > > On Fri, Jan 21, 2011 at 3

Re: Using Lucene to search live, being-edited documents

2011-01-21 Thread software visualization
Hi sorry for the long delay. The idea is that a single user is editing a single document. As they edit, any indexes built against the document become stale, actually wrong. Example: references to specific localities within this document are all instantly wrong the first time a user types a new be

Re: Using Lucene to search live, being-edited documents

2011-01-21 Thread Umesh Prasad
Hi, One work around would be to version the documents and store the version as well as the timestamp of indexed document into the index. Reading between lines I assume that Document is a) stored in some DB/File : b) indexed in lucene index User Search On on b) Document ids but documents are d

Re: Using Lucene to search live, being-edited documents

2011-01-21 Thread software visualization
If I understand you correctly, I think that this : If T2 < T1, Skip the result. will always be the case. The live being edited document is always "later" in time than the indexed information about it. On Fri, Jan 21, 2011 at 9:11 PM, Umesh Prasad wrote: > Hi, > One work around would be to

Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-21 Thread mike anderson
[x] ASF Mirrors (linked in our release announcements or via the Lucene website) [] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.) [x] I/we build them from source via an SVN/Git checkout. [] Other (someone in your company mirrors them internally or via a downstream project) On

Lucene , hits per document

2011-01-21 Thread Sharma Kollaparthi
Hi , I have started to use Lucene for searching in HTML files. Is it possible to get Hits per document, when we search for phrases like "Hello World" and wild card searches like "te?t"? I managed to return the number of hits per document if there is only one term using termfrequency vecto

Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-21 Thread Khosro Asgharifard
[x] ASF Mirrors (linked in our release announcements or via the Lucene website) [] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.) [x] I/we build them from source via an SVN/Git checkout. [] Other (someone in your company mirrors them internally or via a downstream project) O

Re: Using Lucene to search live, being-edited documents

2011-01-21 Thread Lance Norskog
There's a feature in lucene called an "instantiated" index. This has all of the Lucene data structures directly as objects instead of serialized to disk or a RAMDirectory. It never needs to be committed: you index a document and it is immediately searchable. It is larger and faster than a normal in

Re: Using Lucene to search live, being-edited documents

2011-01-21 Thread Umesh Prasad
Nopes. It won't be the case always. Users will not be always editing the document. They will edit the document, then save which will be persisted in db. You can use db triggers to push it into a indexing queue, from which indexer can regularly pick up the document for indexing. You can schedule you

Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-21 Thread Umesh Prasad
[] ASF Mirrors (linked in our release announcements or via the Lucene website) [x] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.) [x] I/we build them from source via an SVN/Git checkout. [] Other (someone in your company mirrors them internally or via a downstream project) On F