Re: How to index part numbers

2011-01-28 Thread Chris Hostetter
: Subject: How to index part numbers : References: <4d428976.6010...@fastmail.fm> : In-Reply-To: <4d428976.6010...@fastmail.fm> http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to

Re: ****SPAM(5.0)**** Re: How to index part numbers

2011-01-28 Thread Erick Erickson
I wonder if you can define the problem away? It sounds like you have essentially random input here. That is, the users can put in whatever they want so whatever you do will be wrong sometime. Could you sidestep the problem with auto-complete and prefix queries (essentially adding * to the user's in

Re: ****SPAM(5.0)**** Re: How to index part numbers

2011-01-28 Thread Karolina Bernat
oh, okay.. well for the XML part we use Apache Digester and define rules to enclose the correct elements. But I can't tell what's the best way to proceed in your case, sorry. The steps you listed here sound reasonable to me. If you want to get search hits for a part number range and highlight 'A12

Re: ****SPAM(5.0)**** Re: How to index part numbers

2011-01-28 Thread Wulf Berschin
Hi Karolina, yes (of course!) We have an XML element for the part numbers, but upto now they are not all tagged thus we need regex matching as well... Am 28.01.2011 13:31, schrieb Karolina Bernat: Hi Wulf, can I ask, if it is structured documentation (like XML or SGML) you're dealing with? I

Re: How to index part numbers

2011-01-28 Thread Karolina Bernat
Hi Wulf, can I ask, if it is structured documentation (like XML or SGML) you're dealing with? It's because I also work with technical documentation and we do exactly, waht you're asking for, but it is XML-data. On Fri, Jan 28, 2011 at 1:05 PM, Wulf Berschin wrote: > Hi, > > I'm poking in the d

How to index part numbers

2011-01-28 Thread Wulf Berschin
Hi, I'm poking in the dark and hope someone has some light... We have part numbers in technical documentation to retrieve. For now we have a (long) regular expression to find those in a string. The part numbers have letters, digits and (redundant) whitespace. Furthermore authors often used a