How do you know when index.optimize has finished ?

2011-01-28 Thread Paul Taylor
I'm building six different indexes in series, at the end of building an index I call optimize() and then close() the writer, then move onto the next one. I build them in series because they are extracting the data from a database and I don't want to overload the database. However the optimizatio

Re: How do you know when index.optimize has finished ?

2011-01-28 Thread Michael McCandless
You can call IW.waitForMerges(). Mike On Fri, Jan 28, 2011 at 4:16 AM, Paul Taylor wrote: > I'm building six different indexes in series, at the end of building an > index I call optimize() and then close() the writer, then move onto the next > one. > I build them in series because they are extr

How to index part numbers

2011-01-28 Thread Wulf Berschin
Hi, I'm poking in the dark and hope someone has some light... We have part numbers in technical documentation to retrieve. For now we have a (long) regular expression to find those in a string. The part numbers have letters, digits and (redundant) whitespace. Furthermore authors often used a

Re: How to index part numbers

2011-01-28 Thread Karolina Bernat
Hi Wulf, can I ask, if it is structured documentation (like XML or SGML) you're dealing with? It's because I also work with technical documentation and we do exactly, waht you're asking for, but it is XML-data. On Fri, Jan 28, 2011 at 1:05 PM, Wulf Berschin wrote: > Hi, > > I'm poking in the d

Re: ****SPAM(5.0)**** Re: How to index part numbers

2011-01-28 Thread Wulf Berschin
Hi Karolina, yes (of course!) We have an XML element for the part numbers, but upto now they are not all tagged thus we need regex matching as well... Am 28.01.2011 13:31, schrieb Karolina Bernat: Hi Wulf, can I ask, if it is structured documentation (like XML or SGML) you're dealing with? I

Re: ****SPAM(5.0)**** Re: How to index part numbers

2011-01-28 Thread Karolina Bernat
oh, okay.. well for the XML part we use Apache Digester and define rules to enclose the correct elements. But I can't tell what's the best way to proceed in your case, sorry. The steps you listed here sound reasonable to me. If you want to get search hits for a part number range and highlight 'A12

Re: ****SPAM(5.0)**** Re: How to index part numbers

2011-01-28 Thread Erick Erickson
I wonder if you can define the problem away? It sounds like you have essentially random input here. That is, the users can put in whatever they want so whatever you do will be wrong sometime. Could you sidestep the problem with auto-complete and prefix queries (essentially adding * to the user's in

Token position vs. token offset - how to bring them together?

2011-01-28 Thread Karolina Bernat
Hello, since I moved on with my offset-info problem in HTML files, I got a new one trying to bring the tokens positions information together with tokens/term offset information. Can someone tell me, how can I get a token, if I know its position? It would be nice to get the tokens position from the

Re: How to index part numbers

2011-01-28 Thread Chris Hostetter
: Subject: How to index part numbers : References: <4d428976.6010...@fastmail.fm> : In-Reply-To: <4d428976.6010...@fastmail.fm> http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing

Re: Trying to extend MappingCharFilter so that it only changes a token if the length of the token matches the length of singleMatch

2011-01-28 Thread Koji Sekiguchi
(11/01/25 2:14), Paul Taylor wrote: On 22/01/2011 15:43, Koji Sekiguchi wrote: (11/01/20 22:19), Paul Taylor wrote: Trying to extend MappingCharFilter so that it only changes a token if the length of the token matches the length of singleMatch in NormalizeCharMap (currently the singleMatch ju